pith. machine review for the scientific record. sign in
Pith Number

pith:7ZPEYLCN

pith:2024:7ZPEYLCNXN25DBLLRY7K3WOPVX
not attested not anchored not stored refs resolved

Autoregressive Video Generation without Vector Quantization

Haiwen Diao, Haoge Deng, Huchuan Lu, Shiguang Shan, Ting Pan, Xinlong Wang, Yonggang Qi, Yufeng Cui, Zhengxiong Luo

Video generation can be done autoregressively without vector quantization by predicting frames sequentially in time and sets spatially within each frame.

arxiv:2412.14169 v2 · 2024-12-18 · cs.CV

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

NOVA surpasses prior autoregressive video models in data efficiency, inference speed, visual fidelity, and video fluency, even with a much smaller model capacity, i.e., 0.6B parameters. NOVA also outperforms state-of-the-art image diffusion models in text-to-image generation tasks, with a significantly lower training cost.

C2weakest assumption

That non-quantized autoregressive modeling via temporal frame-by-frame prediction and spatial set-by-set prediction can preserve sufficient visual information and coherence without the discretization step of vector quantization.

C3one line summary

NOVA reformulates video generation as non-quantized autoregressive frame-by-frame temporal prediction combined with set-by-set spatial prediction, outperforming prior AR video models and some diffusion models in efficiency and quality.

References

36 extracted · 36 resolved · 21 Pith anchors

[1] PaLM 2 Technical Report · arXiv:2305.10403
[2] Imagen 3.arXiv preprint arXiv:2408.07009, 2024
[3] Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets · arXiv:2311.15127
[4] Chameleon: Mixed-Modal Early-Fusion Foundation Models · arXiv:2405.09818
[5] Muse: Text-to-image generation via masked generative transformers

Cited by

17 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:13.741021Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

fe5e4c2c4dbb75d1856b8e3eadd9cfadea2a131a47c7adf1e52372c690a70f27

Aliases

arxiv: 2412.14169 · arxiv_version: 2412.14169v2 · doi: 10.48550/arxiv.2412.14169 · pith_short_12: 7ZPEYLCNXN25 · pith_short_16: 7ZPEYLCNXN25DBLL · pith_short_8: 7ZPEYLCN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/7ZPEYLCNXN25DBLLRY7K3WOPVX \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: fe5e4c2c4dbb75d1856b8e3eadd9cfadea2a131a47c7adf1e52372c690a70f27
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "42d3b0a85d1c104df17025883ce4255539ab4b297994ec1e168e725c9d51b1b8",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-12-18T18:59:53Z",
    "title_canon_sha256": "b9c8d759d1cd8ab10e1d038f59278a2d6078dc29101a1a8ed9c5069dd200ab32"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2412.14169",
    "kind": "arxiv",
    "version": 2
  }
}