{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:YQKBZ3CUQHSWPWPK6O3SOP7KVF","short_pith_number":"pith:YQKBZ3CU","schema_version":"1.0","canonical_sha256":"c4141cec5481e567d9eaf3b7273feaa96636f43db9659f58d8d650690a8fadb0","source":{"kind":"arxiv","id":"2509.23951","version":2},"attestation_state":"computed","paper":{"title":"HunyuanImage 3.0 Technical Report","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"HunyuanImage 3.0 delivers an 80B-parameter MoE model unifying multimodal understanding and generation that matches prior state-of-the-art results while being fully open-sourced.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Bing Wu, Changlin Li, Chao Zhang, Chunyu Wang, Donghao Li, Duojun Huang, Fanbin Lu, Fang Yang, Hangting Chen, Hao Wen, Jiale Tao, Jianbing Wu, Jianchen Zhu, Jian-Wei Zhang, Jiaxin Lin, Jie Jiang, Jingmiao Yu, Junzhe Li, Kai Wang, Kipper Gong, Lei Wang, Linqing Wang, Linus, Lucas Wang, Lucaz Liu, Miles Yang, Peizhen Zhang, Peng Chen, Pengfei Wan, Penghao Zhao, Qinglin Lu, Qi Tian, Qixun Wang, Senhao Xie, Shi-Xue Zhang, Shu Liu, Siyu Cao, Songtao Liu, Tao Zhang, Tiankai Hang, Tianpeng Gu, Weigang Zhang, Weijie Kong, Weiyan Wang, Xiangwei Shen, Xiaofeng Yang, Xinchi Deng, Xin Li, Xiusen Gu, Xuan Yang, Xuefei Zhe, Yang Li, Yangyu Tao, Yanxin Long, Yepeng Zhang, Yiji Cheng, Ying Dong, Yingfang Zhang, Yixuan Shi, Yuanbo Peng, Yue Wu, Yuhong Liu, Yu Liu, Yutao Cui, Yuyang Peng, Zhantao Yang, Zhao Zhong, Zhengkai Jiang, Zheng Yuan, Zhenxi Li, Zhimin Li, Zhiyuan Zhao, Zihao Zhang, Zijian Zhang","submitted_at":"2025-09-28T16:14:10Z","abstract_excerpt":"We present HunyuanImage 3.0, a native multimodal model that unifies multimodal understanding and generation within an autoregressive framework, with its image generation module publicly available. The achievement of HunyuanImage 3.0 relies on several key components, including meticulous data curation, advanced architecture design, a native Chain-of-Thoughts schema, progressive model pre-training, aggressive model post-training, and an efficient infrastructure that enables large-scale training and inference. With these advancements, we successfully trained a Mixture-of-Experts (MoE) model compr"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2509.23951","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CV","submitted_at":"2025-09-28T16:14:10Z","cross_cats_sorted":[],"title_canon_sha256":"306199ab13eb864830f8984ac307ff631c684db66f2573fa72da20a42d5e4e48","abstract_canon_sha256":"0e6cebf8e6403076d2e4d1c409e0d5883523b4ef369be85cef11b61ab36c2e1e"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:49.435515Z","signature_b64":"VfpU4odhyBqGHVuiCcl4TwEDB7RtT+bU96zZHZ2vtOF9VktB2YH4wMI+YEfXETQ/mUT56gcXsCT+K27WOI4dDg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"c4141cec5481e567d9eaf3b7273feaa96636f43db9659f58d8d650690a8fadb0","last_reissued_at":"2026-05-17T23:38:49.434990Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:49.434990Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"HunyuanImage 3.0 Technical Report","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"HunyuanImage 3.0 delivers an 80B-parameter MoE model unifying multimodal understanding and generation that matches prior state-of-the-art results while being fully open-sourced.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Bing Wu, Changlin Li, Chao Zhang, Chunyu Wang, Donghao Li, Duojun Huang, Fanbin Lu, Fang Yang, Hangting Chen, Hao Wen, Jiale Tao, Jianbing Wu, Jianchen Zhu, Jian-Wei Zhang, Jiaxin Lin, Jie Jiang, Jingmiao Yu, Junzhe Li, Kai Wang, Kipper Gong, Lei Wang, Linqing Wang, Linus, Lucas Wang, Lucaz Liu, Miles Yang, Peizhen Zhang, Peng Chen, Pengfei Wan, Penghao Zhao, Qinglin Lu, Qi Tian, Qixun Wang, Senhao Xie, Shi-Xue Zhang, Shu Liu, Siyu Cao, Songtao Liu, Tao Zhang, Tiankai Hang, Tianpeng Gu, Weigang Zhang, Weijie Kong, Weiyan Wang, Xiangwei Shen, Xiaofeng Yang, Xinchi Deng, Xin Li, Xiusen Gu, Xuan Yang, Xuefei Zhe, Yang Li, Yangyu Tao, Yanxin Long, Yepeng Zhang, Yiji Cheng, Ying Dong, Yingfang Zhang, Yixuan Shi, Yuanbo Peng, Yue Wu, Yuhong Liu, Yu Liu, Yutao Cui, Yuyang Peng, Zhantao Yang, Zhao Zhong, Zhengkai Jiang, Zheng Yuan, Zhenxi Li, Zhimin Li, Zhiyuan Zhao, Zihao Zhang, Zijian Zhang","submitted_at":"2025-09-28T16:14:10Z","abstract_excerpt":"We present HunyuanImage 3.0, a native multimodal model that unifies multimodal understanding and generation within an autoregressive framework, with its image generation module publicly available. The achievement of HunyuanImage 3.0 relies on several key components, including meticulous data curation, advanced architecture design, a native Chain-of-Thoughts schema, progressive model pre-training, aggressive model post-training, and an efficient infrastructure that enables large-scale training and inference. With these advancements, we successfully trained a Mixture-of-Experts (MoE) model compr"},"claims":{"count":3,"items":[{"kind":"strongest_claim","text":"We successfully trained a Mixture-of-Experts (MoE) model comprising over 80 billion parameters in total, with 13 billion parameters activated per token during inference, making it the largest and most powerful open-source image generative model to date.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the reported human and automatic evaluations used representative test sets and unbiased raters, and that no undisclosed data or compute advantages explain the competitive results.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"HunyuanImage 3.0 delivers an 80B-parameter MoE model unifying multimodal understanding and generation that matches prior state-of-the-art results while being fully open-sourced.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"}],"snapshot_sha256":"6837f2a9b2b98ff02408e199fc69eda40ac60728cc4c412355b69f4b56385427"},"source":{"id":"2509.23951","kind":"arxiv","version":2},"verdict":{"id":"e54ea6ae-5acc-461a-a1c1-2d16cc284b16","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T01:56:26.396669Z","strongest_claim":"We successfully trained a Mixture-of-Experts (MoE) model comprising over 80 billion parameters in total, with 13 billion parameters activated per token during inference, making it the largest and most powerful open-source image generative model to date.","one_line_summary":"HunyuanImage 3.0 delivers an 80B-parameter MoE model unifying multimodal understanding and generation that matches prior state-of-the-art results while being fully open-sourced.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the reported human and automatic evaluations used representative test sets and unbiased raters, and that no undisclosed data or compute advantages explain the competitive results.","pith_extraction_headline":""},"references":{"count":45,"sample":[{"doi":"","year":2020,"title":"Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851","work_id":"82ba805b-3e59-43c6-b37f-3aa1940eea68","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2010,"title":"Denoising Diffusion Implicit Models","work_id":"8fa2128b-d18c-405c-ac92-0e669cf89ac0","ref_index":2,"cited_arxiv_id":"2010.02502","is_internal_anchor":true},{"doi":"","year":2021,"title":"Diffusion models beat gans on image synthesis","work_id":"632b8c2b-98f4-4211-8cd8-c9bb7c5b4b8b","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2011,"title":"Score-Based Generative Modeling through Stochastic Differential Equations","work_id":"d9110e53-a5d4-4794-a4c5-a575e91c31ad","ref_index":4,"cited_arxiv_id":"2011.13456","is_internal_anchor":true},{"doi":"","year":2022,"title":"High- resolution image synthesis with latent diffusion models","work_id":"5427867b-47ba-4d43-a415-2912684a2d41","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":45,"snapshot_sha256":"835cfaf63e6c23787aa780b264ce51f14d86513957a3c6b9d67b91966571d80e","internal_anchors":15},"formal_canon":{"evidence_count":3,"snapshot_sha256":"947a160d758c455a7139d820b1aa8d01376cbeb46ccbb3fb1e40465071ea07c1"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2509.23951","created_at":"2026-05-17T23:38:49.435078+00:00"},{"alias_kind":"arxiv_version","alias_value":"2509.23951v2","created_at":"2026-05-17T23:38:49.435078+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2509.23951","created_at":"2026-05-17T23:38:49.435078+00:00"},{"alias_kind":"pith_short_12","alias_value":"YQKBZ3CUQHSW","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"YQKBZ3CUQHSWPWPK","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"YQKBZ3CU","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":38,"internal_anchor_count":38,"sample":[{"citing_arxiv_id":"2603.28767","citing_title":"Gen-Searcher: Reinforcing Agentic Search for Image Generation","ref_index":45,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21487","citing_title":"Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21605","citing_title":"GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2605.23522","citing_title":"Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2602.00122","citing_title":"VDE Bench: Evaluating The Capability of Image Editing Models to Modify Visual Documents","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21573","citing_title":"Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21605","citing_title":"GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22344","citing_title":"Bernini: Latent Semantic Planning for Video Diffusion","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2601.01593","citing_title":"Beyond Patches: Global-aware Autoregressive Model for Multimodal Few-Shot Font Generation","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2603.00607","citing_title":"IdGlow: Dynamic Identity Modulation for Multi-Subject Generation","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18678","citing_title":"Lance: Unified Multimodal Modeling by Multi-Task Synergy","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21487","citing_title":"Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05204","citing_title":"D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17311","citing_title":"SpecSem-Net: Integrating Spectral and Semantic Features for Robust AI-generated Video Detection","ref_index":51,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17834","citing_title":"Stabilizing, Scaling & Enhancing MeanFlow for Large-scale Diffusion Distillation","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18678","citing_title":"Lance: Unified Multimodal Modeling by Multi-Task Synergy","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14876","citing_title":"Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2511.22663","citing_title":"AIA: Rethinking Architecture Decoupling Strategy In Unified Multimodal Model","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2512.07584","citing_title":"LongCat-Image Technical Report","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2511.18870","citing_title":"HunyuanVideo 1.5 Technical Report","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2603.28767","citing_title":"Gen-Searcher: Reinforcing Agentic Search for Image Generation","ref_index":45,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13565","citing_title":"Qwen-Image-VAE-2.0 Technical Report","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03400","citing_title":"Banana100: Breaking NR-IQA Metrics by 100 Iterative Image Replications with Nano Banana Pro","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03061","citing_title":"Can Nano Banana 2 Replace Traditional Image Restoration Models? An Evaluation of Its Performance on Image Restoration Tasks","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12500","citing_title":"SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture","ref_index":11,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":3,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/YQKBZ3CUQHSWPWPK6O3SOP7KVF","json":"https://pith.science/pith/YQKBZ3CUQHSWPWPK6O3SOP7KVF.json","graph_json":"https://pith.science/api/pith-number/YQKBZ3CUQHSWPWPK6O3SOP7KVF/graph.json","events_json":"https://pith.science/api/pith-number/YQKBZ3CUQHSWPWPK6O3SOP7KVF/events.json","paper":"https://pith.science/paper/YQKBZ3CU"},"agent_actions":{"view_html":"https://pith.science/pith/YQKBZ3CUQHSWPWPK6O3SOP7KVF","download_json":"https://pith.science/pith/YQKBZ3CUQHSWPWPK6O3SOP7KVF.json","view_paper":"https://pith.science/paper/YQKBZ3CU","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2509.23951&json=true","fetch_graph":"https://pith.science/api/pith-number/YQKBZ3CUQHSWPWPK6O3SOP7KVF/graph.json","fetch_events":"https://pith.science/api/pith-number/YQKBZ3CUQHSWPWPK6O3SOP7KVF/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/YQKBZ3CUQHSWPWPK6O3SOP7KVF/action/timestamp_anchor","attest_storage":"https://pith.science/pith/YQKBZ3CUQHSWPWPK6O3SOP7KVF/action/storage_attestation","attest_author":"https://pith.science/pith/YQKBZ3CUQHSWPWPK6O3SOP7KVF/action/author_attestation","sign_citation":"https://pith.science/pith/YQKBZ3CUQHSWPWPK6O3SOP7KVF/action/citation_signature","submit_replication":"https://pith.science/pith/YQKBZ3CUQHSWPWPK6O3SOP7KVF/action/replication_record"}},"created_at":"2026-05-17T23:38:49.435078+00:00","updated_at":"2026-05-17T23:38:49.435078+00:00"}