Miradata: A large-scale video dataset with long durations and structured captions.Advances in Neural Information Processing Systems, 37:48955–48970

Xuan Ju, Yiming Gao, Zhaoyang Zhang, Ziyang Yuan, Xintao Wang, Ailing Zeng, Yu Xiong, Qiang Xu, Ying Shan · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.

VDCook:DIY video data cook your MLLMs

cs.LG · 2026-03-04 · unverdicted · novelty 5.0

VDCook is an automated, self-evolving platform for generating in-domain video datasets for MLLMs via natural language queries, retrieval-synthesis, and multi-dimensional metadata.

citing papers explorer

Showing 2 of 2 citing papers.

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer cs.CV · 2026-05-14 · unverdicted · none · ref 88
SANA-WM is a 2.6B-parameter efficient world model that synthesizes minute-scale 720p videos with 6-DoF camera control, trained on 213K public clips in 15 days on 64 H100s and runnable on single GPUs at 36x higher throughput than prior open baselines.
VDCook:DIY video data cook your MLLMs cs.LG · 2026-03-04 · unverdicted · none · ref 10
VDCook is an automated, self-evolving platform for generating in-domain video datasets for MLLMs via natural language queries, retrieval-synthesis, and multi-dimensional metadata.

Miradata: A large-scale video dataset with long durations and structured captions.Advances in Neural Information Processing Systems, 37:48955–48970

fields

years

verdicts

representative citing papers

citing papers explorer