DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick

Arno Solin; Mohammad Hassan Vali; Tom B\"ackstr\"om

arxiv: 2509.26469 · v3 · submitted 2025-09-30 · 💻 cs.LG

DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick

Mohammad Hassan Vali , Tom B\"ackstr\"om , Arno Solin This is my paper

Pith reviewed 2026-05-18 11:53 UTC · model grok-4.3

classification 💻 cs.LG

keywords vector quantizationdifferentiable quantizationreparameterization trickVQ-VAEimage compressionspeech codingend-to-end training

0 comments

The pith

DiVeQ makes vector quantization differentiable by reparameterizing it as an additive error vector.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes DiVeQ to solve the non-differentiability of vector quantization in deep learning models. It models the quantization as adding an error vector to the input that approximates the distortion, enabling gradient flow through the reparameterization trick while maintaining hard assignments in the forward pass. The SF-DiVeQ variant uses a space-filling curve formed by connecting codewords to assign inputs, which reduces quantization error and ensures full use of the codebook. These methods allow end-to-end training without auxiliary losses or temperature schedules. Evaluations on VQ-VAE for image compression, VQGAN for image generation, and DAC for speech coding demonstrate improvements in reconstruction quality and sample quality over existing quantization approaches.

Core claim

DiVeQ treats quantization as the addition of an error vector that mimics the quantization distortion. This keeps the forward pass as a hard assignment to codewords but permits gradients to be computed via the reparameterization trick. The space-filling DiVeQ constructs a curve by connecting the lines between codewords for input assignment, leading to less quantization error and complete codebook usage.

What carries the argument

The additive error vector model for quantization distortion using the reparameterization trick, and the space-filling curve for codebook assignment.

Load-bearing premise

That the additive error vector accurately captures the quantization distortion sufficiently for stable and effective gradient propagation.

What would settle it

A direct comparison on the same datasets showing that DiVeQ leads to worse reconstruction or generation metrics than methods relying on straight-through estimators or Gumbel softmax would falsify the claim.

read the original abstract

Vector quantization is common in deep models, yet its hard assignments block gradients and hinder end-to-end training. We propose DiVeQ, which treats quantization as adding an error vector that mimics the quantization distortion, keeping the forward pass hard while letting gradients flow. We also present a space-filling variant (SF-DiVeQ) that assigns input to a curve constructed by the lines connecting codewords, resulting in less quantization error and full codebook usage. Both methods train end-to-end without requiring auxiliary losses or temperature schedules. In VQ-VAE image compression, VQGAN image generation, and DAC speech coding tasks across various data sets, our proposed methods improve reconstruction and sample quality over alternative quantization approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces DiVeQ, a differentiable vector quantization technique that models the quantization step as the addition of an error vector mimicking the distortion, using the reparameterization trick to permit gradient flow while preserving exact hard assignments in the forward pass. It further proposes SF-DiVeQ, which constructs a space-filling curve from lines connecting codewords for assignment, aiming for reduced quantization error and complete codebook utilization. Both variants are presented as enabling fully end-to-end training without auxiliary losses or temperature schedules. Empirical evaluations on VQ-VAE image compression, VQGAN image generation, and DAC speech coding across datasets report improved reconstruction and sample quality relative to alternative quantization methods.

Significance. If the reparameterization indeed supplies stable, unbiased gradients that reliably substitute for auxiliary losses while maintaining hard forward behavior, the approach could meaningfully simplify optimization of vector-quantized generative models and mitigate codebook collapse issues. The space-filling assignment offers a concrete mechanism for better utilization, and the reported gains across image and speech domains indicate practical relevance if the gradient properties are verified.

major comments (2)

The central claim that the mimicking error vector produces gradients stable and unbiased enough to drop auxiliary losses and temperature schedules lacks a derivation showing that the expected gradient equals the true subgradient of the quantization operator or any bias/variance analysis in high-dimensional latent spaces. This is load-bearing for the end-to-end training assertion.
Experiments section: reported improvements in reconstruction and sample quality are presented without error bars, multiple random seeds, or statistical tests, making it impossible to determine whether gains are robust or attributable to incidental codebook utilization rather than the claimed gradient property.

minor comments (2)

The description of the space-filling curve assignment would benefit from an explicit algorithmic outline or pseudocode to clarify how inputs are mapped to curve segments.
Notation for the error vector and its sampling distribution is introduced without a dedicated equation defining its distribution parameters or sampling procedure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and describe the revisions we intend to incorporate.

read point-by-point responses

Referee: The central claim that the mimicking error vector produces gradients stable and unbiased enough to drop auxiliary losses and temperature schedules lacks a derivation showing that the expected gradient equals the true subgradient of the quantization operator or any bias/variance analysis in high-dimensional latent spaces. This is load-bearing for the end-to-end training assertion.

Authors: We acknowledge the value of a formal derivation for the gradient properties. In the revised manuscript we will add a dedicated subsection in the Methods that derives the expected gradient under the error-vector reparameterization and relates it to the subgradient of the hard quantization operator. We will also include a brief bias/variance discussion for high-dimensional latent spaces together with supporting gradient-norm measurements from the trained models. revision: yes
Referee: Experiments section: reported improvements in reconstruction and sample quality are presented without error bars, multiple random seeds, or statistical tests, making it impossible to determine whether gains are robust or attributable to incidental codebook utilization rather than the claimed gradient property.

Authors: We agree that the experimental presentation would be strengthened by statistical rigor. In the revised Experiments section we will report results averaged over at least five independent random seeds, include error bars (standard deviation), and add statistical significance tests comparing DiVeQ / SF-DiVeQ against the baselines. We will also briefly discuss the extent to which observed gains can be attributed to improved gradient flow versus improved codebook utilization. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents DiVeQ as a direct modeling choice that treats quantization distortion as an additive error vector via the standard reparameterization trick, preserving a hard forward pass while enabling gradient flow, and introduces SF-DiVeQ via explicit space-filling curve assignment on codeword lines. Neither the abstract nor the described method reduces any claimed result (gradient stability, codebook utilization, or empirical gains on VQ-VAE/VQGAN/DAC) to a fitted parameter renamed as prediction, a self-referential definition, or a load-bearing self-citation chain. The central premise is an ansatz for differentiable quantization that stands as an independent proposal rather than an algebraic identity with its inputs; reported improvements are positioned as empirical outcomes, not forced by construction. This matches the default expectation of a self-contained contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the domain assumption that an additive error vector can faithfully mimic hard quantization for gradient purposes and on the modeling choice of a space-filling curve; both are introduced without independent evidence beyond the reported task improvements.

axioms (1)

domain assumption The reparameterization trick can be applied to an additive error vector so that the forward pass remains identical to hard vector quantization while gradients flow.
This is the key modeling step that lets the method keep a hard forward pass.

invented entities (2)

Mimicking error vector no independent evidence
purpose: To approximate quantization distortion differentiably
New modeling device introduced to bypass the non-differentiable argmin operation.
Space-filling curve connecting codewords no independent evidence
purpose: To achieve continuous assignment and full codebook utilization
Novel geometric construction proposed for the SF-DiVeQ variant.

pith-pipeline@v0.9.0 · 5654 in / 1529 out tokens · 43409 ms · 2026-05-18T11:53:17.007194+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DiVeQ models quantization as adding a simulated error vector whose magnitude equals the input–codeword distance and whose direction is aligned with the nearest codeword... z_q = z + ||c_i* - z||₂ · sg[v_d / ||v_d||₂]
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SF-DiVeQ quantizes along codeword connections... z_q = z + ||c_i* - z||₂ · sg[(1-λ)v_d i* / ||v_d i*||₂] + ...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.