pith:L3NA6IVR
Head Forcing: Long Autoregressive Video Generation via Head Heterogeneity
Attention heads in autoregressive video diffusion transformers naturally divide into local, anchor, and memory roles, enabling a training-free Head Forcing method to generate minute-long videos by assigning each type specialized KV cache策略.
arxiv:2605.14487 v1 · 2026-05-14 · cs.CV · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{L3NA6IVRNT557KRRZYRICUT76V}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Without additional training, Head Forcing extends generation from 5 seconds to minute-level duration, supports multi-prompt interactive synthesis, and consistently outperforms existing baselines.
That attention heads in AR video diffusion transformers naturally and reliably fall into distinct functional categories (local for detail refinement, anchor for structural stabilization, memory for long-range context) that can be identified and assigned effective tailored KV cache strategies without any model-specific training or validation.
Head Forcing assigns tailored KV cache strategies to local, anchor, and memory attention heads plus head-wise RoPE re-encoding to extend autoregressive video generation from seconds to minutes without training.
References
Receipt and verification
| First computed | 2026-05-17T23:39:06.477970Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
5eda0f22b16cfbdfaa31ce2281527ff57f9b9bd32944180336922f00e088c04d
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/L3NA6IVRNT557KRRZYRICUT76V \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5eda0f22b16cfbdfaa31ce2281527ff57f9b9bd32944180336922f00e088c04d
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "1d4ec4f37b71cade6bb4662ffbf739d1161d570f08b28cb2cb42a080a6ea2a43",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-05-14T07:27:39Z",
"title_canon_sha256": "37768ef9866f318c71d0fcbfc4e35c6f2acaca235c13e38eeda1ee854f4d176b"
},
"schema_version": "1.0",
"source": {
"id": "2605.14487",
"kind": "arxiv",
"version": 1
}
}