Pith Number

pith:7ZPEYLCN

pith:2024:7ZPEYLCNXN25DBLLRY7K3WOPVX

not attested not anchored not stored refs resolved

Autoregressive Video Generation without Vector Quantization

Haiwen Diao, Haoge Deng, Huchuan Lu, Shiguang Shan, Ting Pan, Xinlong Wang, Yonggang Qi, Yufeng Cui, Zhengxiong Luo

Video generation can be done autoregressively without vector quantization by predicting frames sequentially in time and sets spatially within each frame.

arxiv:2412.14169 v2 · 2024-12-18 · cs.CV

Open paper page JSON Open Graph Bundle Merged state What is a Pith Number?

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

NOVA surpasses prior autoregressive video models in data efficiency, inference speed, visual fidelity, and video fluency, even with a much smaller model capacity, i.e., 0.6B parameters. NOVA also outperforms state-of-the-art image diffusion models in text-to-image generation tasks, with a significantly lower training cost.

C2weakest assumption

That non-quantized autoregressive modeling via temporal frame-by-frame prediction and spatial set-by-set prediction can preserve sufficient visual information and coherence without the discretization step of vector quantization.

C3one line summary

NOVA reformulates video generation as non-quantized autoregressive frame-by-frame temporal prediction combined with set-by-set spatial prediction, outperforming prior AR video models and some diffusion models in efficiency and quality.

References

36 extracted · 36 resolved · 21 Pith anchors

[1] PaLM 2 Technical Report · arXiv:2305.10403

[2] Imagen 3.arXiv preprint arXiv:2408.07009, 2024

[3] Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets · arXiv:2311.15127

[4] Chameleon: Mixed-Modal Early-Fusion Foundation Models · arXiv:2405.09818

[5] Muse: Text-to-image generation via masked generative transformers

Cited by

17 papers in Pith

Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation

Receipt and verification

First computed	2026-05-17T23:38:13.741021Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

fe5e4c2c4dbb75d1856b8e3eadd9cfadea2a131a47c7adf1e52372c690a70f27

Aliases

arxiv: 2412.14169 · arxiv_version: 2412.14169v2 · doi: 10.48550/arxiv.2412.14169 · pith_short_12: 7ZPEYLCNXN25 · pith_short_16: 7ZPEYLCNXN25DBLL · pith_short_8: 7ZPEYLCN

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/7ZPEYLCNXN25DBLLRY7K3WOPVX \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: fe5e4c2c4dbb75d1856b8e3eadd9cfadea2a131a47c7adf1e52372c690a70f27

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "42d3b0a85d1c104df17025883ce4255539ab4b297994ec1e168e725c9d51b1b8",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-12-18T18:59:53Z",
    "title_canon_sha256": "b9c8d759d1cd8ab10e1d038f59278a2d6078dc29101a1a8ed9c5069dd200ab32"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2412.14169",
    "kind": "arxiv",
    "version": 2
  }
}