pith. sign in
Pith Number

pith:U52T37GU

pith:2022:U52T37GUSXOPBDROGA4KV2MDWC
not attested not anchored not stored refs resolved

Latent Video Diffusion Models for High-Fidelity Long Video Generation

Qifeng Chen, Tianyu Yang, Yingqing He, Ying Shan, Yong Zhang

Video diffusion models shift to a low-dimensional 3D latent space to generate realistic clips longer than 1000 frames with modest compute.

arxiv:2211.13221 v2 · 2022-11-23 · cs.CV · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{U52T37GUSXOPBDROGA4KV2MDWC}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we introduce lightweight video diffusion models by leveraging a low-dimensional 3D latent space, significantly outperforming previous pixel-space video diffusion models under a limited computational budget... hierarchical diffusion in the latent space such that longer videos with more than one thousand frames can be produced... conditional latent perturbation and unconditional guidance that effectively mitigate the accumulated errors during the extension of video length.

C2weakest assumption

The low-dimensional 3D latent space preserves sufficient spatial-temporal detail for high-fidelity generation, and the added perturbation and guidance steps prevent error accumulation without introducing new artifacts or inconsistencies.

C3one line summary

Latent-space hierarchical diffusion models with targeted error-correction techniques generate realistic videos exceeding 1000 frames while using less compute than prior pixel-space approaches.

References

48 extracted · 48 resolved · 15 Pith anchors

[1] Large scale GAN training for high fidelity natural image synthesis 2019
[2] Generating long videos of dynamic scenes 2022
[3] Hier- archical video generation for complex data 2021
[4] Diffusion models beat gans on image synthesis 2021
[5] Taming transformers for high-resolution image synthesis 2021

Formal links

2 machine-checked theorem links

Cited by

41 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:53.534898Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a7753dfcd495dcf08e2e3038aae983b0816e34f4ab1fadbd1e3ba9fe6640db33

Aliases

arxiv: 2211.13221 · arxiv_version: 2211.13221v2 · doi: 10.48550/arxiv.2211.13221 · pith_short_12: U52T37GUSXOP · pith_short_16: U52T37GUSXOPBDRO · pith_short_8: U52T37GU
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/U52T37GUSXOPBDROGA4KV2MDWC \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a7753dfcd495dcf08e2e3038aae983b0816e34f4ab1fadbd1e3ba9fe6640db33
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "6dbcccf4bb7c02fbfe9928bc7b713e502af96cf6459bfa583f91f5be25e07262",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2022-11-23T18:58:39Z",
    "title_canon_sha256": "c6faa78873c360f2d65fa170a921710b4b4f23535ada711afe37d86bc2dc53c3"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2211.13221",
    "kind": "arxiv",
    "version": 2
  }
}