pith. sign in
Pith Number

pith:OVTZMQ4L

pith:2025:OVTZMQ4L5I6FT2X6KD52IQZJDF
not attested not anchored not stored refs resolved

History-Guided Video Diffusion

Boyuan Chen, Kiwhan Song, Max Simchowitz, Russ Tedrake, Vincent Sitzmann, Yilun Du

Diffusion Forcing Transformer lets video models condition on any number of past frames.

arxiv:2502.06764 v2 · 2025-02-10 · cs.LG · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{OVTZMQ4L5I6FT2X6KD52IQZJDF}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We propose the Diffusion Forcing Transformer (DFoT), a video diffusion architecture and theoretically grounded training objective that jointly enable conditioning on a flexible number of history frames. We then introduce History Guidance, a family of guidance methods uniquely enabled by DFoT.

C2weakest assumption

That the DFoT training objective and architecture truly support arbitrary-length history without hidden performance costs or instability, and that the proposed history guidance methods generalize beyond the tested datasets and lengths.

C3one line summary

DFoT enables flexible history conditioning in video diffusion, with history guidance methods that boost temporal consistency and support long rollouts.

References

70 extracted · 70 resolved · 20 Pith anchors

[1] All are worth words: A vit backbone for diffusion models 2023
[2] Bellec, P. C. Optimal exponential bounds for aggregation of density estimators. Bernoulli, 23 0 (1): 0 219--248, 2017 2017
[3] Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets 2023 · arXiv:2311.15127
[4] W., Fidler, S., and Kreis, K 2023
[5] Video generation models as world simulators 2024

Cited by

28 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:47.953184Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

756796438bea3c59eafe50fba44329197d7363b68df6f6c2c614f33ca7b2c00e

Aliases

arxiv: 2502.06764 · arxiv_version: 2502.06764v2 · doi: 10.48550/arxiv.2502.06764 · pith_short_12: OVTZMQ4L5I6F · pith_short_16: OVTZMQ4L5I6FT2X6 · pith_short_8: OVTZMQ4L
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/OVTZMQ4L5I6FT2X6KD52IQZJDF \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 756796438bea3c59eafe50fba44329197d7363b68df6f6c2c614f33ca7b2c00e
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "7f67076de87788c69b47d5551c71b2d7952f4d9b071ccc3f97727c58fedf0259",
    "cross_cats_sorted": [
      "cs.CV"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2025-02-10T18:44:25Z",
    "title_canon_sha256": "cd40cad7c6e5ff3cfb9fe443a7080dd443898d14479092a35d08ed647021426f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2502.06764",
    "kind": "arxiv",
    "version": 2
  }
}