pith:HFQBZVID
Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models
LocalDPO aligns text-to-video diffusion models by optimizing preferences only on locally corrupted regions of real videos.
arxiv:2601.04068 v4 · 2026-01-07 · cs.CV · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{HFQBZVIDLX4CUCQZJBDV3CIAEL}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Experiments on Wan2.1 and CogVideoX demonstrate that LocalDPO consistently improves video fidelity, temporal coherence and human preference scores over other post-training approaches, establishing a more efficient and fine-grained paradigm for video generator alignment.
That videos created by locally masking real footage and inpainting only the masked regions with the frozen base model produce negatives whose flaws correspond to the kinds of errors humans actually dislike at the region level.
LocalDPO creates localized preference pairs from real videos by applying random spatio-temporal masks and restoring masked regions with the frozen base model, then applies region-restricted DPO loss to improve fidelity and coherence in video diffusion models.
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-21T01:04:20.654858Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
39601cd5035df82a0a1948475d890022d1a5caa7e4fe1cfe2ac2c83965b3c8b4
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/HFQBZVIDLX4CUCQZJBDV3CIAEL \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 39601cd5035df82a0a1948475d890022d1a5caa7e4fe1cfe2ac2c83965b3c8b4
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "43878163f4f400605643cda096dfa433a4547fb1f1f7f6d08d8bd90c7630978b",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-01-07T16:32:17Z",
"title_canon_sha256": "88155bb746b5acc291ca04674b6ac289e3d0a10639fd4dfc314d8efafefd3ca0"
},
"schema_version": "1.0",
"source": {
"id": "2601.04068",
"kind": "arxiv",
"version": 4
}
}