Pith Number

pith:M44AZR7K

pith:2024:M44AZR7KUISSMYFGAT2PSYYPBM

not attested not anchored not stored refs resolved

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Jian Yang, Kepan Nan, Penghao Zhou, Rui Xie, Tiehan Fan, Xiang Li, Ying Tai, Zhenheng Yang, Zhijie Chen

OpenVid-1M supplies over a million precise text-video pairs with expressive captions to improve text-to-video generation.

arxiv:2407.02371 v3 · 2024-07-02 · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{M44AZR7KUISSMYFGAT2PSYYPBM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

we introduce OpenVid-1M, a precise high-quality dataset with expressive captions. This open-scenario dataset contains over 1 million text-video pairs, facilitating research on T2V generation. Furthermore, we curate 433K 1080p videos from OpenVid-1M to create OpenVidHD-0.4M... Additionally, we propose a novel Multi-modal Video Diffusion Transformer (MVDiT) capable of mining both structure information from visual tokens and semantic information from text tokens.

C2weakest assumption

That the newly collected videos and captions are verifiably higher quality and more precise than prior datasets such as WebVid-10M and Panda-70M, and that the MVDiT architecture delivers measurable gains attributable to its joint structure-semantic processing rather than other training factors.

C3one line summary

OpenVid-1M supplies 1 million high-quality text-video pairs and introduces MVDiT to improve text-to-video generation by better using both visual structure and text semantics.

References

16 extracted · 16 resolved · 7 Pith anchors

[1] Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets · arXiv:2311.15127

[2] VideoCrafter1: Open Diffusion Models for High-Quality Video Generation · arXiv:2310.19512

[3] Adam: A Method for Stochastic Optimization · arXiv:1412.6980

[4] arXiv preprint arXiv:2310.11440 (2023) 2, 4 2024

[5] Latte: Latent Diffusion Transformer for Video Generation · arXiv:2401.03048

Formal links

2 machine-checked theorem links

Cited by

35 papers in Pith

LaMo: Self-Supervised Latent Motion Priors for Physical Realism in Video Generation

VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset

RoPeSLR: 3D RoPE-driven Sparse-LowRank Attention for Efficient Diffusion Transformers

Image-to-Video Diffusion: From Foundations to Open Frontiers

LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs

Receipt and verification

First computed	2026-05-17T23:39:21.816981Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

67380cc7eaa2252660a604f4f9630f0b3c355564591318ef7194b0b8e63d550c

Aliases

arxiv: 2407.02371 · arxiv_version: 2407.02371v3 · doi: 10.48550/arxiv.2407.02371 · pith_short_12: M44AZR7KUISS · pith_short_16: M44AZR7KUISSMYFG · pith_short_8: M44AZR7K

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/M44AZR7KUISSMYFGAT2PSYYPBM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 67380cc7eaa2252660a604f4f9630f0b3c355564591318ef7194b0b8e63d550c

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "230a2191ea85b2201b99bf2b8f086ab595e36b5159b23b660a63a7a64b90a4e2",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2024-07-02T15:40:29Z",
    "title_canon_sha256": "6674247ef4e27bb49c2ec829d0b8e94091ddb7195d8285ce828c482c1465f25f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2407.02371",
    "kind": "arxiv",
    "version": 3
  }
}