pith. sign in

super hub Canonical reference

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Canonical reference. 85% of citing Pith papers cite this work as background.

335 Pith papers citing it
Background 85% of classified citations
abstract

Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including data curation, advanced architectural design, progressive model scaling and training, and an efficient infrastructure tailored for large-scale model training and inference. As a result, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models. We conducted extensive experiments and implemented a series of targeted designs to ensure high visual quality, motion dynamics, text-video alignment, and advanced filming techniques. According to evaluations by professionals, HunyuanVideo outperforms previous state-of-the-art models, including Runway Gen-3, Luma 1.6, and three top-performing Chinese video generative models. By releasing the code for the foundation model and its applications, we aim to bridge the gap between closed-source and open-source communities. This initiative will empower individuals within the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem. The code is publicly available at https://github.com/Tencent/HunyuanVideo.

hub tools

citation-role summary

background 73 baseline 6 method 3 dataset 2 other 1

citation-polarity summary

claims ledger

  • abstract Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including

authors

co-cited works

clear filters

representative citing papers

Flow-GRPO: Training Flow Matching Models via Online RL

cs.CV · 2025-05-08 · unverdicted · novelty 8.0

Flow-GRPO is the first online RL method for flow matching models, raising GenEval accuracy from 63% to 95% and text-rendering accuracy from 59% to 92% with little reward hacking.

OmniTryOn: Video Try-On Anything at Once!

cs.CV · 2026-06-07 · unverdicted · novelty 7.0

OmniTryOn performs multi-object video virtual try-on in one pass using first-frame wearable caching and spatiotemporal RoPE, outperforming single-garment baselines on a new TryAny-Bench dataset.

Ultra-Fast Neural Video Compression

cs.CV · 2026-06-03 · unverdicted · novelty 7.0

DCVC-UF uses chunk-based joint encoding and parallel frame-specific decoding to deliver ultra-fast neural video compression while claiming new state-of-the-art rate-distortion performance.

citing papers explorer

Showing 6 of 6 citing papers after filters.

  • From Articulated Kinematics to Routed Visual Control for Action-Conditioned Surgical Video Generation cs.CV · 2026-05-09 · unverdicted · none · ref 36 · internal anchor

    A kinematic-to-visual lifting paradigm combined with hierarchically routed control generates action-conditioned surgical videos with better faithfulness, fidelity, and efficiency.

  • DCR: Counterfactual Attractor Guidance for Rare Compositional Generation cs.CV · 2026-05-07 · unverdicted · none · ref 23 · internal anchor

    DCR uses a counterfactual attractor and projection-based repulsion to suppress default completion bias in diffusion models, improving fidelity for rare compositional prompts while preserving quality.

  • LottieGPT: Tokenizing Vector Animation for Autoregressive Generation cs.CV · 2026-04-13 · unverdicted · none · ref 43 · internal anchor

    LottieGPT tokenizes Lottie animations into compact sequences and fine-tunes Qwen-VL to autoregressively generate coherent vector animations from natural language or visual prompts, outperforming prior SVG models.

  • Generative Refinement Networks for Visual Synthesis cs.CV · 2026-04-14 · unverdicted · none · ref 29 · internal anchor

    GRN uses hierarchical binary quantization and entropy-guided refinement to set new ImageNet records of 0.56 rFID for reconstruction and 1.81 gFID for class-conditional generation while releasing code and models.

  • Context Unrolling in Omni Models cs.CV · 2026-04-23 · unverdicted · none · ref 22 · internal anchor

    Omni is a multimodal model whose native training on diverse data types enables context unrolling, allowing explicit reasoning across modalities to better approximate shared knowledge and improve downstream performance.

  • Motif-Video 2B: Technical Report cs.CV · 2026-04-14 · unverdicted · none · ref 18 · 2 links · internal anchor

    Motif-Video 2B reaches 83.76% on VBench, outperforming a 14B-parameter model with 7x fewer parameters and far less training data through shared cross-attention and a three-part backbone.