{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:GEEMHY33UYXFKVM5POAYQWCH3I","short_pith_number":"pith:GEEMHY33","schema_version":"1.0","canonical_sha256":"3108c3e37ba62e55559d7b81885847da3e9ae82ec85eaeaa49e52e3b2602088b","source":{"kind":"arxiv","id":"2511.18870","version":2},"attestation_state":"computed","paper":{"title":"HunyuanVideo 1.5 Technical Report","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"An 8.3-billion-parameter model delivers state-of-the-art open-source video generation on consumer hardware.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Bing Wu, Bo Peng, Changlin Li, Chang Zou, Coopers Li, Duojun Huang, Fang Yang, Gu Gong, Guojian Xiao, Hao Tan, Jack Peng, Jiahe Tian, Jianbing Wu, Jiangfeng Xiong, Jiaxin Lin, Jie Jiang, Jie Liu, Jiesong Lian, Jihong Zhang, Kaihang Pan, Lei Wang, Lin Niu, Linus, Miles Yang, Mingtao Chen, Mingyang Chen, Mingzhe Zheng, Patrol, Peizhen Zhang, Peng Chen, Penghao Zhao, Qiangqiang Hu, Qi Tian, Qiuyong Xiao, Qi Yang, Rui Yuan, Runzhou Wu, Ryan Xu, Shanshan Sang, Shisheng Huang, Shuo Huang, Siruis Gong, Songtao Liu, Weijie Kong, Weiting Guo, Weiyan Wang, Wenzhi Sun, Xiang Yuan, Xianshun Ren, Xiao He, Xiaojia Chen, Xiaoyan Yuan, Xiaoyue Mi, Xiawei Hu, Xiele Wu, Xinchi Deng, Xin Li, Xuefei Zhe, Yang Li, Yanxin Long, Yepeng Zhang, Yifu Sun, Yiting Lu, Yitong Li, Yixuan Li, You Huang, Yuanbo Peng, Yuan Zhou, Yue Wu, Yuhang Deng, Yuhong Liu, Yu Tang, Zhao Zhong, Zhenyu Wang, Zhenzhi Lu, Zhichao Hu, Zhiguang Liu, Zhihe Yang, Zilin Yang, Zixiang Zhou, Zuozhuo Dai","submitted_at":"2025-11-24T08:22:07Z","abstract_excerpt":"We present HunyuanVideo 1.5, a lightweight yet powerful open-source video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture featuring selective and sliding tile attention (SSTA), enhanced bilingual understanding through glyph-aware text encoding, progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these de"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2511.18870","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CV","submitted_at":"2025-11-24T08:22:07Z","cross_cats_sorted":[],"title_canon_sha256":"dcbbc66a83b88876ae4664b2f29b18fa1653df68a5457e1a8cfe1a41e383e500","abstract_canon_sha256":"d7a84ef786f3ff82a9523ece279fcd44b32aca8adb7b7c2be07d749ac3f7a992"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:53.818855Z","signature_b64":"fUg2WJ/9QQGWQQuzux/bM6t4EYb5sEB8VyGcLbxQBgQc5cWA6oJ0RrZm7YwD1e35YsCNkPyJa4SqkdmR2R2WDQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"3108c3e37ba62e55559d7b81885847da3e9ae82ec85eaeaa49e52e3b2602088b","last_reissued_at":"2026-05-17T23:38:53.818228Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:53.818228Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"HunyuanVideo 1.5 Technical Report","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"An 8.3-billion-parameter model delivers state-of-the-art open-source video generation on consumer hardware.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Bing Wu, Bo Peng, Changlin Li, Chang Zou, Coopers Li, Duojun Huang, Fang Yang, Gu Gong, Guojian Xiao, Hao Tan, Jack Peng, Jiahe Tian, Jianbing Wu, Jiangfeng Xiong, Jiaxin Lin, Jie Jiang, Jie Liu, Jiesong Lian, Jihong Zhang, Kaihang Pan, Lei Wang, Lin Niu, Linus, Miles Yang, Mingtao Chen, Mingyang Chen, Mingzhe Zheng, Patrol, Peizhen Zhang, Peng Chen, Penghao Zhao, Qiangqiang Hu, Qi Tian, Qiuyong Xiao, Qi Yang, Rui Yuan, Runzhou Wu, Ryan Xu, Shanshan Sang, Shisheng Huang, Shuo Huang, Siruis Gong, Songtao Liu, Weijie Kong, Weiting Guo, Weiyan Wang, Wenzhi Sun, Xiang Yuan, Xianshun Ren, Xiao He, Xiaojia Chen, Xiaoyan Yuan, Xiaoyue Mi, Xiawei Hu, Xiele Wu, Xinchi Deng, Xin Li, Xuefei Zhe, Yang Li, Yanxin Long, Yepeng Zhang, Yifu Sun, Yiting Lu, Yitong Li, Yixuan Li, You Huang, Yuanbo Peng, Yuan Zhou, Yue Wu, Yuhang Deng, Yuhong Liu, Yu Tang, Zhao Zhong, Zhenyu Wang, Zhenzhi Lu, Zhichao Hu, Zhiguang Liu, Zhihe Yang, Zilin Yang, Zixiang Zhou, Zuozhuo Dai","submitted_at":"2025-11-24T08:22:07Z","abstract_excerpt":"We present HunyuanVideo 1.5, a lightweight yet powerful open-source video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs. This achievement is built upon several key components, including meticulous data curation, an advanced DiT architecture featuring selective and sliding tile attention (SSTA), enhanced bilingual understanding through glyph-aware text encoding, progressive pre-training and post-training, and an efficient video super-resolution network. Leveraging these de"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"this compact and proficient model establishes a new state-of-the-art among open-source video generation models","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the internal experiments provide fair, comprehensive comparisons to prior open-source models without undisclosed data selection or evaluation biases.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"HunyuanVideo 1.5 delivers state-of-the-art open-source text-to-video and image-to-video generation with an 8.3B parameter DiT model featuring SSTA attention, glyph-aware encoding, and progressive training.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"An 8.3-billion-parameter model delivers state-of-the-art open-source video generation on consumer hardware.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"e09eb6c3c7c54907e6278f5042bb0f78c95cd48f1abe5726f16e8f568a137d87"},"source":{"id":"2511.18870","kind":"arxiv","version":2},"verdict":{"id":"4b07a996-5e7a-4240-a9b2-9ba8f9d2f02d","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T02:24:05.282106Z","strongest_claim":"this compact and proficient model establishes a new state-of-the-art among open-source video generation models","one_line_summary":"HunyuanVideo 1.5 delivers state-of-the-art open-source text-to-video and image-to-video generation with an 8.3B parameter DiT model featuring SSTA attention, glyph-aware encoding, and progressive training.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the internal experiments provide fair, comprehensive comparisons to prior open-source models without undisclosed data selection or evaluation biases.","pith_extraction_headline":"An 8.3-billion-parameter model delivers state-of-the-art open-source video generation on consumer hardware."},"references":{"count":22,"sample":[{"doi":"","year":2025,"title":"Kling 2.5 turbo","work_id":"9ae2d3cd-8371-44d1-80ca-132a95e1c9e6","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Google DeepMind. Veo 3.1.https://deepmind.google/technologies/veo/, 2025","work_id":"f9cff5d1-11c7-4d3f-b828-e6191b2a8fb8","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"OpenAI. Sora 2.https://openai.com/sora, 2025","work_id":"6637c1d2-6ab4-4033-8851-f02ca9f82247","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"HunyuanVideo: A Systematic Framework For Large Video Generative Models","work_id":"881efa7e-7e73-4c66-9cc3-2803e551061c","ref_index":4,"cited_arxiv_id":"2412.03603","is_internal_anchor":true},{"doi":"","year":2025,"title":"Step-video-t2v tech- nical report: The practice, challenges, and future of video foundation model","work_id":"f9be7275-0997-4f96-bbde-dcd6ea138476","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":22,"snapshot_sha256":"5c3c296aaa536284093af2db2f0ebe668b97f2ab4b438bb6aa015d8e87d54eab","internal_anchors":6},"formal_canon":{"evidence_count":2,"snapshot_sha256":"f4d7f5ab884ef984fd1a8163d8e9a5534b66074d3c2864fc43f761f8febe9b2c"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2511.18870","created_at":"2026-05-17T23:38:53.818347+00:00"},{"alias_kind":"arxiv_version","alias_value":"2511.18870v2","created_at":"2026-05-17T23:38:53.818347+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2511.18870","created_at":"2026-05-17T23:38:53.818347+00:00"},{"alias_kind":"pith_short_12","alias_value":"GEEMHY33UYXF","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"GEEMHY33UYXFKVM5","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"GEEMHY33","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":43,"internal_anchor_count":43,"sample":[{"citing_arxiv_id":"2605.23891","citing_title":"Smart-Insertion-V: Photorealistic Video Insertion via a Closed-Loop Feedback Dual-Stream Framework","ref_index":33,"is_internal_anchor":true},{"citing_arxiv_id":"2512.23994","citing_title":"PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14382","citing_title":"Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19398","citing_title":"Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18678","citing_title":"Lance: Unified Multimodal Modeling by Multi-Task Synergy","ref_index":123,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21042","citing_title":"Dynamic Video Generation: Shaping Video Generation Across Time and Space","ref_index":39,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21072","citing_title":"Q-ARVD: Quantizing Autoregressive Video Diffusion Models","ref_index":22,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16503","citing_title":"Motif-Video 2B: Technical Report","ref_index":44,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14382","citing_title":"Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2605.16649","citing_title":"AtlasVid: Efficient Ultra-High-Resolution Long Video Generation via Decoupled Global-Local Modeling","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15980","citing_title":"Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15964","citing_title":"WorldVLN: Autoregressive World Action Model for Aerial Vision-Language Navigation","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17423","citing_title":"Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration","ref_index":44,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18678","citing_title":"Lance: Unified Multimodal Modeling by Multi-Task Synergy","ref_index":122,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18346","citing_title":"Focused Forcing: Content-Aware Per-Frame KV Selection for Efficient Autoregressive Video Diffusion","ref_index":42,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19398","citing_title":"Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15458","citing_title":"Video Models Can Reason with Verifiable Rewards","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2512.23994","citing_title":"PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2602.02958","citing_title":"Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2512.13507","citing_title":"Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2603.11755","citing_title":"Controllable Egocentric Video Generation via Occlusion-Aware Sparse 3D Hand Joints","ref_index":56,"is_internal_anchor":true},{"citing_arxiv_id":"2604.19092","citing_title":"RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation","ref_index":53,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14382","citing_title":"Delta Forcing: Trust Region Steering for Interactive Autoregressive Video Generation","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13565","citing_title":"Qwen-Image-VAE-2.0 Technical Report","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2603.28489","citing_title":"Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms","ref_index":39,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/GEEMHY33UYXFKVM5POAYQWCH3I","json":"https://pith.science/pith/GEEMHY33UYXFKVM5POAYQWCH3I.json","graph_json":"https://pith.science/api/pith-number/GEEMHY33UYXFKVM5POAYQWCH3I/graph.json","events_json":"https://pith.science/api/pith-number/GEEMHY33UYXFKVM5POAYQWCH3I/events.json","paper":"https://pith.science/paper/GEEMHY33"},"agent_actions":{"view_html":"https://pith.science/pith/GEEMHY33UYXFKVM5POAYQWCH3I","download_json":"https://pith.science/pith/GEEMHY33UYXFKVM5POAYQWCH3I.json","view_paper":"https://pith.science/paper/GEEMHY33","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2511.18870&json=true","fetch_graph":"https://pith.science/api/pith-number/GEEMHY33UYXFKVM5POAYQWCH3I/graph.json","fetch_events":"https://pith.science/api/pith-number/GEEMHY33UYXFKVM5POAYQWCH3I/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/GEEMHY33UYXFKVM5POAYQWCH3I/action/timestamp_anchor","attest_storage":"https://pith.science/pith/GEEMHY33UYXFKVM5POAYQWCH3I/action/storage_attestation","attest_author":"https://pith.science/pith/GEEMHY33UYXFKVM5POAYQWCH3I/action/author_attestation","sign_citation":"https://pith.science/pith/GEEMHY33UYXFKVM5POAYQWCH3I/action/citation_signature","submit_replication":"https://pith.science/pith/GEEMHY33UYXFKVM5POAYQWCH3I/action/replication_record"}},"created_at":"2026-05-17T23:38:53.818347+00:00","updated_at":"2026-05-17T23:38:53.818347+00:00"}