pith. sign in
Pith Number

pith:U5KALI76

pith:2023:U5KALI76GX5OZF7IVTQ3GLLULE
not attested not anchored not stored refs resolved

Finite Scalar Quantization: VQ-VAE Made Simple

David Minnen, Eirikur Agustsson, Fabian Mentzer, Michael Tschannen

FSQ replaces vector quantization in VQ-VAEs by projecting latents to a few dimensions and quantizing each independently to fixed levels.

arxiv:2309.15505 v2 · 2023-09-27 · cs.CV · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{U5KALI76GX5OZF7IVTQ3GLLULE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Despite the much simpler design of FSQ, we obtain competitive performance in all these tasks. We emphasize that FSQ does not suffer from codebook collapse and does not need the complex machinery employed in VQ (commitment losses, codebook reseeding, code splitting, entropy penalties, etc.) to learn expressive discrete representations.

C2weakest assumption

That projecting the VAE latent to a small number of dimensions (typically less than 10) and quantizing each independently to fixed levels preserves sufficient representational capacity for the downstream tasks to match VQ performance.

C3one line summary

Finite scalar quantization simplifies VQ-VAE latents by independently rounding a few dimensions to fixed levels, producing an equivalent-sized implicit codebook with competitive performance and no collapse.

References

22 extracted · 22 resolved · 8 Pith anchors

[1] Cm3: A causal masked multimodal model of the internet
[2] Scaling laws for generative mixed-modal language models.arXiv preprint arXiv:2301.03728
[3] High Quality Monocular Depth Estimation via Transfer Learning · arXiv:1812.11941
[4] End-to-end optimized image compression
[5] Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation · arXiv:1308.3432

Formal links

2 machine-checked theorem links

Cited by

28 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:46.967163Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a75405a3fe35faec97e8ace1b32d7459196712903675423b00ec02d36aade776

Aliases

arxiv: 2309.15505 · arxiv_version: 2309.15505v2 · doi: 10.48550/arxiv.2309.15505 · pith_short_12: U5KALI76GX5O · pith_short_16: U5KALI76GX5OZF7I · pith_short_8: U5KALI76
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/U5KALI76GX5OZF7IVTQ3GLLULE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a75405a3fe35faec97e8ace1b32d7459196712903675423b00ec02d36aade776
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "38f0f105f4217052b88ab6b5dbfc6a4369c298813925934c25e84d837197197d",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-09-27T09:13:40Z",
    "title_canon_sha256": "62083644e49d78ab01722dcc2435bd6d5edf475e93cf1896c3bc04031e788647"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2309.15505",
    "kind": "arxiv",
    "version": 2
  }
}