{"paper":{"title":"HunyuanVideo 1.5 Technical Report","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"An 8.3-billion-parameter model delivers state-of-the-art open-source video generation on consumer hardware.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Bing Wu, Bo Peng, Changlin Li, Chang Zou, Coopers Li, Duojun Huang, Fang Yang, Gu Gong, Guojian Xiao, Hao Tan, Jack Peng, Jiahe Tian, Jianbing Wu, Jiangfeng Xiong, Jiaxin Lin, Jie Jiang, Jie Liu, Jiesong Lian, Jihong Zhang, Kaihang Pan, Lei Wang, Lin Niu, Linus, Miles Yang, Mingtao Chen, Mingyang Chen, Mingzhe Zheng, Patrol, Peizhen Zhang, Peng Chen, Penghao Zhao, Qiangqiang Hu, Qi Tian, Qiuyong Xiao, Qi Yang, Rui Yuan, Runzhou Wu, Ryan Xu, Shanshan Sang, Shisheng Huang, Shuo Huang, Siruis Gong, Songtao Liu, Weijie Kong, Weiting Guo, Weiyan Wang, Wenzhi Sun, Xiang Yuan, Xianshun Ren, Xiao He, Xiaojia Chen, Xiaoyan Yuan, Xiaoyue Mi, Xiawei Hu, Xiele Wu, Xinchi Deng, Xin Li, Xuefei Zhe, Yang Li, Yanxin Long, Yepeng Zhang, Yifu Sun, Yiting Lu, Yitong Li, Yixuan Li, You Huang, Yuanbo Peng, Yuan Zhou, Yue Wu, Yuhang Deng, Yuhong Liu, Yu Tang, Zhao Zhong, Zhenyu Wang, Zhenzhi Lu, Zhichao Hu, Zhiguang Liu, Zhihe Yang, Zilin Yang, Zixiang Zhou, Zuozhuo Dai","submitted_at":"2025-11-24T08:22:07Z","abstract_excerpt":"We present HunyuanVideo 1.5, a lightweight yet powerful open-source video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture featuring selective and sliding tile attention (SSTA), enhanced bilingual understanding through glyph-aware text encoding, progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these de"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"this compact and proficient model establishes a new state-of-the-art among open-source video generation models","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the internal experiments provide fair, comprehensive comparisons to prior open-source models without undisclosed data selection or evaluation biases.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"HunyuanVideo 1.5 delivers state-of-the-art open-source text-to-video and image-to-video generation with an 8.3B parameter DiT model featuring SSTA attention, glyph-aware encoding, and progressive training.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"An 8.3-billion-parameter model delivers state-of-the-art open-source video generation on consumer hardware.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"e09eb6c3c7c54907e6278f5042bb0f78c95cd48f1abe5726f16e8f568a137d87"},"source":{"id":"2511.18870","kind":"arxiv","version":2},"verdict":{"id":"4b07a996-5e7a-4240-a9b2-9ba8f9d2f02d","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T02:24:05.282106Z","strongest_claim":"this compact and proficient model establishes a new state-of-the-art among open-source video generation models","one_line_summary":"HunyuanVideo 1.5 delivers state-of-the-art open-source text-to-video and image-to-video generation with an 8.3B parameter DiT model featuring SSTA attention, glyph-aware encoding, and progressive training.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the internal experiments provide fair, comprehensive comparisons to prior open-source models without undisclosed data selection or evaluation biases.","pith_extraction_headline":"An 8.3-billion-parameter model delivers state-of-the-art open-source video generation on consumer hardware."},"references":{"count":22,"sample":[{"doi":"","year":2025,"title":"Kling 2.5 turbo","work_id":"9ae2d3cd-8371-44d1-80ca-132a95e1c9e6","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Google DeepMind. Veo 3.1.https://deepmind.google/technologies/veo/, 2025","work_id":"f9cff5d1-11c7-4d3f-b828-e6191b2a8fb8","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"OpenAI. Sora 2.https://openai.com/sora, 2025","work_id":"6637c1d2-6ab4-4033-8851-f02ca9f82247","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"HunyuanVideo: A Systematic Framework For Large Video Generative Models","work_id":"881efa7e-7e73-4c66-9cc3-2803e551061c","ref_index":4,"cited_arxiv_id":"2412.03603","is_internal_anchor":true},{"doi":"","year":2025,"title":"Step-video-t2v tech- nical report: The practice, challenges, and future of video foundation model","work_id":"f9be7275-0997-4f96-bbde-dcd6ea138476","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":22,"snapshot_sha256":"5c3c296aaa536284093af2db2f0ebe668b97f2ab4b438bb6aa015d8e87d54eab","internal_anchors":6},"formal_canon":{"evidence_count":2,"snapshot_sha256":"f4d7f5ab884ef984fd1a8163d8e9a5534b66074d3c2864fc43f761f8febe9b2c"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}