{"work":{"id":"881efa7e-7e73-4c66-9cc3-2803e551061c","openalex_id":null,"doi":null,"arxiv_id":"2412.03603","raw_key":null,"title":"HunyuanVideo: A Systematic Framework For Large Video Generative Models","authors":null,"authors_text":"Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al","year":2024,"venue":"cs.CV","abstract":"Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including data curation, advanced architectural design, progressive model scaling and training, and an efficient infrastructure tailored for large-scale model training and inference. As a result, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models. We conducted extensive experiments and implemented a series of targeted designs to ensure high visual quality, motion dynamics, text-video alignment, and advanced filming techniques. According to evaluations by professionals, HunyuanVideo outperforms previous state-of-the-art models, including Runway Gen-3, Luma 1.6, and three top-performing Chinese video generative models. By releasing the code for the foundation model and its applications, we aim to bridge the gap between closed-source and open-source communities. This initiative will empower individuals within the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem. The code is publicly available at https://github.com/Tencent/HunyuanVideo.","external_url":"https://arxiv.org/abs/2412.03603","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-29T13:13:27.512591+00:00","pith_arxiv_id":"2412.03603","created_at":"2026-05-08T18:28:57.555143+00:00","updated_at":"2026-06-29T13:13:27.512591+00:00","title_quality_ok":true,"display_title":"HunyuanVideo: A Systematic Framework For Large Video Generative Models","render_title":"HunyuanVideo: A Systematic Framework For Large Video Generative Models"},"hub":{"state":{"work_id":"881efa7e-7e73-4c66-9cc3-2803e551061c","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":260,"external_cited_by_count":null,"distinct_field_count":12,"first_pith_cited_at":"2024-01-05T19:55:15+00:00","last_pith_cited_at":"2026-05-31T05:35:36+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T13:08:47.250239+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":73},{"context_role":"baseline","n":6},{"context_role":"method","n":3},{"context_role":"dataset","n":2},{"context_role":"other","n":1}],"polarity_counts":[{"context_polarity":"background","n":72},{"context_polarity":"baseline","n":6},{"context_polarity":"use_method","n":3},{"context_polarity":"unclear","n":2},{"context_polarity":"use_dataset","n":2}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"HunyuanVideo: A Systematic Framework For Large Video Generative Models","claims":[{"claim_text":"Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks HunyuanVideo: A Systematic Framework For Large Video Generative Models because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T00:54:00.645165+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"612eb159-d5f2-4491-9b9a-42f3006dd60f","orcid":null,"display_name":"Weijie Kong"},{"id":"fd999b1b-e124-441e-8fcb-1b23b8b8e0d5","orcid":null,"display_name":"Qi Tian"},{"id":"8d61f941-089d-47ca-acdd-abcccc0b3eec","orcid":null,"display_name":"Zijian Zhang"},{"id":"8880cf02-a981-4d92-b228-0b4d0a7640b8","orcid":null,"display_name":"Rox Min"},{"id":"89a53a78-cdb0-41a3-b9bf-a7c14cac66a5","orcid":null,"display_name":"Zuozhuo Dai"},{"id":"6782d843-b714-4d5f-b5fb-8bfd9e4562d0","orcid":null,"display_name":"Jin Zhou"}]},"error":null,"updated_at":"2026-05-14T00:54:01.687625+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T01:24:13.793345+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"Wan: Open and Advanced Large-Scale Video Generative Models","work_id":"ad3ebc3b-4224-46c9-b61d-bcf135da0a7c","shared_citers":99},{"title":"CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer","work_id":"f38fc088-12aa-4bf4-9ecd-08d3e797ccb7","shared_citers":48},{"title":"Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets","work_id":"4f68eada-27e3-437a-a2fe-6e4ca524d0d3","shared_citers":48},{"title":"Flow Matching for Generative Modeling","work_id":"6edb71c4-5d64-40af-a394-9757ea051a36","shared_citers":28},{"title":"AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning","work_id":"1f9d1d3b-a6d6-45a9-9f13-51393c03be8a","shared_citers":22},{"title":"Movie Gen: A Cast of Media Foundation Models","work_id":"a6a118b0-002f-4b19-881f-7f1183e0d7d8","shared_citers":22},{"title":"Open-Sora: Democratizing Efficient Video Production for All","work_id":"8b29ba7b-3d84-4281-85b7-9eaf905afd7f","shared_citers":22},{"title":"LTX-Video: Realtime Video Latent Diffusion","work_id":"cee5c521-3ce9-466e-a035-1e42f89254f4","shared_citers":21},{"title":"Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion","work_id":"53e58ef9-7932-4b83-b757-34ac14db3e0f","shared_citers":21},{"title":"Seedance 1.0: Exploring the Boundaries of Video Generation Models","work_id":"b2e36b5d-99e4-45b4-9358-64f6d3501983","shared_citers":20},{"title":"Qwen3-VL Technical Report","work_id":"1fe243aa-e3c0-4da6-b391-4cbcfc88d5c0","shared_citers":17},{"title":"Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow","work_id":"a1989e1b-d66d-4533-be3a-fb9c5fd62290","shared_citers":15},{"title":"Qwen-Image Technical Report","work_id":"d06d7ecc-7579-4f89-a60b-4278a0f3c562","shared_citers":15},{"title":"CameraCtrl: Enabling Camera Control for Text-to-Video Generation","work_id":"1c05c278-c023-4ef0-a359-25a41f1065eb","shared_citers":14},{"title":"Classifier-Free Diffusion Guidance","work_id":"acf2c588-c088-4a6c-938e-150ad7c666d7","shared_citers":14},{"title":"Make-A-Video: Text-to-Video Generation without Text-Video Data","work_id":"52a801fc-a707-45a1-a8cd-0d6702f124ab","shared_citers":14},{"title":"SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis","work_id":"8034c587-fba6-4941-87ba-c98f2ac962cb","shared_citers":14},{"title":"Skyreels-v2: Infinite-length film generative model","work_id":"2ce11350-273e-4f0d-ae78-292aa3151060","shared_citers":14},{"title":"CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers","work_id":"2dbd6bcd-fc98-4fbf-b586-f6d94fe1abd2","shared_citers":13},{"title":"Imagen Video: High Definition Video Generation with Diffusion Models","work_id":"bb20d241-dc6f-4b0a-b071-fd43a2cbd57f","shared_citers":13},{"title":"Cosmos World Foundation Model Platform for Physical AI","work_id":"a2dba24c-318d-476a-8b21-4289c265810c","shared_citers":12},{"title":"Improving Video Generation with Human Feedback","work_id":"cfe4c01d-1cf7-4a00-ba86-06d583ca2cff","shared_citers":12},{"title":"Longlive: Real-time interactive long video generation","work_id":"29ec6550-6ac1-4cc6-8aae-b4206f1d5b38","shared_citers":12},{"title":"Self-forcing++: Towards minute-scale high-quality video generation","work_id":"3b83d5f5-6929-46ae-9de2-7c32af3c7346","shared_citers":12}],"time_series":[{"n":2,"year":2024},{"n":10,"year":2025},{"n":105,"year":2026}]},"error":null,"updated_at":"2026-05-14T00:54:01.795573+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"fixed":1,"items":[{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T00:54:06.112997+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"HunyuanVideo: A Systematic Framework For Large Video Generative Models","claims":[{"claim_text":"Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks HunyuanVideo: A Systematic Framework For Large Video Generative Models because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T00:54:00.647878+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"HunyuanVideo: A Systematic Framework For Large Video Generative Models","claims":[{"claim_text":"Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks HunyuanVideo: A Systematic Framework For Large Video Generative Models because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T00:54:01.702101+00:00"}},"summary":{"title":"HunyuanVideo: A Systematic Framework For Large Video Generative Models","claims":[{"claim_text":"Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks HunyuanVideo: A Systematic Framework For Large Video Generative Models because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"Wan: Open and Advanced Large-Scale Video Generative Models","work_id":"ad3ebc3b-4224-46c9-b61d-bcf135da0a7c","shared_citers":99},{"title":"CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer","work_id":"f38fc088-12aa-4bf4-9ecd-08d3e797ccb7","shared_citers":48},{"title":"Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets","work_id":"4f68eada-27e3-437a-a2fe-6e4ca524d0d3","shared_citers":48},{"title":"Flow Matching for Generative Modeling","work_id":"6edb71c4-5d64-40af-a394-9757ea051a36","shared_citers":28},{"title":"AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning","work_id":"1f9d1d3b-a6d6-45a9-9f13-51393c03be8a","shared_citers":22},{"title":"Movie Gen: A Cast of Media Foundation Models","work_id":"a6a118b0-002f-4b19-881f-7f1183e0d7d8","shared_citers":22},{"title":"Open-Sora: Democratizing Efficient Video Production for All","work_id":"8b29ba7b-3d84-4281-85b7-9eaf905afd7f","shared_citers":22},{"title":"LTX-Video: Realtime Video Latent Diffusion","work_id":"cee5c521-3ce9-466e-a035-1e42f89254f4","shared_citers":21},{"title":"Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion","work_id":"53e58ef9-7932-4b83-b757-34ac14db3e0f","shared_citers":21},{"title":"Seedance 1.0: Exploring the Boundaries of Video Generation Models","work_id":"b2e36b5d-99e4-45b4-9358-64f6d3501983","shared_citers":20},{"title":"Qwen3-VL Technical Report","work_id":"1fe243aa-e3c0-4da6-b391-4cbcfc88d5c0","shared_citers":17},{"title":"Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow","work_id":"a1989e1b-d66d-4533-be3a-fb9c5fd62290","shared_citers":15},{"title":"Qwen-Image Technical Report","work_id":"d06d7ecc-7579-4f89-a60b-4278a0f3c562","shared_citers":15},{"title":"CameraCtrl: Enabling Camera Control for Text-to-Video Generation","work_id":"1c05c278-c023-4ef0-a359-25a41f1065eb","shared_citers":14},{"title":"Classifier-Free Diffusion Guidance","work_id":"acf2c588-c088-4a6c-938e-150ad7c666d7","shared_citers":14},{"title":"Make-A-Video: Text-to-Video Generation without Text-Video Data","work_id":"52a801fc-a707-45a1-a8cd-0d6702f124ab","shared_citers":14},{"title":"SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis","work_id":"8034c587-fba6-4941-87ba-c98f2ac962cb","shared_citers":14},{"title":"Skyreels-v2: Infinite-length film generative model","work_id":"2ce11350-273e-4f0d-ae78-292aa3151060","shared_citers":14},{"title":"CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers","work_id":"2dbd6bcd-fc98-4fbf-b586-f6d94fe1abd2","shared_citers":13},{"title":"Imagen Video: High Definition Video Generation with Diffusion Models","work_id":"bb20d241-dc6f-4b0a-b071-fd43a2cbd57f","shared_citers":13},{"title":"Cosmos World Foundation Model Platform for Physical AI","work_id":"a2dba24c-318d-476a-8b21-4289c265810c","shared_citers":12},{"title":"Improving Video Generation with Human Feedback","work_id":"cfe4c01d-1cf7-4a00-ba86-06d583ca2cff","shared_citers":12},{"title":"Longlive: Real-time interactive long video generation","work_id":"29ec6550-6ac1-4cc6-8aae-b4206f1d5b38","shared_citers":12},{"title":"Self-forcing++: Towards minute-scale high-quality video generation","work_id":"3b83d5f5-6929-46ae-9de2-7c32af3c7346","shared_citers":12}],"time_series":[{"n":2,"year":2024},{"n":10,"year":2025},{"n":105,"year":2026}]},"authors":[{"id":"6782d843-b714-4d5f-b5fb-8bfd9e4562d0","orcid":null,"display_name":"Jin Zhou","source":"manual","import_confidence":0.72},{"id":"fd999b1b-e124-441e-8fcb-1b23b8b8e0d5","orcid":null,"display_name":"Qi Tian","source":"manual","import_confidence":0.72},{"id":"8880cf02-a981-4d92-b228-0b4d0a7640b8","orcid":null,"display_name":"Rox Min","source":"manual","import_confidence":0.72},{"id":"612eb159-d5f2-4491-9b9a-42f3006dd60f","orcid":null,"display_name":"Weijie Kong","source":"manual","import_confidence":0.72},{"id":"8d61f941-089d-47ca-acdd-abcccc0b3eec","orcid":null,"display_name":"Zijian Zhang","source":"manual","import_confidence":0.72},{"id":"89a53a78-cdb0-41a3-b9bf-a7c14cac66a5","orcid":null,"display_name":"Zuozhuo Dai","source":"manual","import_confidence":0.72}]}}