pith:M44AZR7K
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
OpenVid-1M supplies over a million precise text-video pairs with expressive captions to improve text-to-video generation.
arxiv:2407.02371 v3 · 2024-07-02 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{M44AZR7KUISSMYFGAT2PSYYPBM}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
we introduce OpenVid-1M, a precise high-quality dataset with expressive captions. This open-scenario dataset contains over 1 million text-video pairs, facilitating research on T2V generation. Furthermore, we curate 433K 1080p videos from OpenVid-1M to create OpenVidHD-0.4M... Additionally, we propose a novel Multi-modal Video Diffusion Transformer (MVDiT) capable of mining both structure information from visual tokens and semantic information from text tokens.
That the newly collected videos and captions are verifiably higher quality and more precise than prior datasets such as WebVid-10M and Panda-70M, and that the MVDiT architecture delivers measurable gains attributable to its joint structure-semantic processing rather than other training factors.
OpenVid-1M supplies 1 million high-quality text-video pairs and introduces MVDiT to improve text-to-video generation by better using both visual structure and text semantics.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:39:21.816981Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
67380cc7eaa2252660a604f4f9630f0b3c355564591318ef7194b0b8e63d550c
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/M44AZR7KUISSMYFGAT2PSYYPBM \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 67380cc7eaa2252660a604f4f9630f0b3c355564591318ef7194b0b8e63d550c
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "230a2191ea85b2201b99bf2b8f086ab595e36b5159b23b660a63a7a64b90a4e2",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CV",
"submitted_at": "2024-07-02T15:40:29Z",
"title_canon_sha256": "6674247ef4e27bb49c2ec829d0b8e94091ddb7195d8285ce828c482c1465f25f"
},
"schema_version": "1.0",
"source": {
"id": "2407.02371",
"kind": "arxiv",
"version": 3
}
}