{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:IPPTRWIXLE5FPQTJBUPLGYMZ5T","short_pith_number":"pith:IPPTRWIX","schema_version":"1.0","canonical_sha256":"43df38d917593a57c2690d1eb36199ecf88174aee67352374c904b24f58931df","source":{"kind":"arxiv","id":"2505.22705","version":1},"attestation_state":"computed","paper":{"title":"HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"HiDream-I1 deploys a 17B-parameter sparse Diffusion Transformer that delivers state-of-the-art images in seconds.","cross_cats":["cs.MM"],"primary_cat":"cs.CV","authors_text":"Bo Zhao, Fengbin Gao, Fuchen Long, Jianzhuang Pan, Jingwen Chen, Kai Yu, Peihan Xu, Qi Cai, Rui Tian, Siyu Wang, Tao Mei, Ting Yao, Wenxuan Chen, Yang Chen, Yehao Li, Yiheng Zhang, Yimeng Wang, Yingwei Pan, Yi Peng, Zhaofan Qiu, Zijian Gong, Ziwei Feng","submitted_at":"2025-05-28T17:59:15Z","abstract_excerpt":"Recent advancements in image generative foundation models have prioritized quality improvements but often at the cost of increased computational complexity and inference latency. To address this critical trade-off, we introduce HiDream-I1, a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds. HiDream-I1 is constructed with a new sparse Diffusion Transformer (DiT) structure. Specifically, it starts with a dual-stream decoupled design of sparse DiT with dynamic Mixture-of-Experts (MoE) architecture, in whic"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":true},"canonical_record":{"source":{"id":"2505.22705","kind":"arxiv","version":1},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CV","submitted_at":"2025-05-28T17:59:15Z","cross_cats_sorted":["cs.MM"],"title_canon_sha256":"82acc6184c8cf584c2e2b30e49132316e34edf4ac9601b8e5ee655ee61c55543","abstract_canon_sha256":"dd550ee1abcbe1bf090759debb645b058ecd0a633ffac7d24ab0285a932ca821"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:47.164011Z","signature_b64":"VzWZLqKYgXQqz8iGTjuS/HWDzQWt6adzkwC8rvnu5/sxkwFmMvifIb7jt7db6wqH94LpaufaxxMHenCKVnzkDQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"43df38d917593a57c2690d1eb36199ecf88174aee67352374c904b24f58931df","last_reissued_at":"2026-05-17T23:38:47.163589Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:47.163589Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"HiDream-I1 deploys a 17B-parameter sparse Diffusion Transformer that delivers state-of-the-art images in seconds.","cross_cats":["cs.MM"],"primary_cat":"cs.CV","authors_text":"Bo Zhao, Fengbin Gao, Fuchen Long, Jianzhuang Pan, Jingwen Chen, Kai Yu, Peihan Xu, Qi Cai, Rui Tian, Siyu Wang, Tao Mei, Ting Yao, Wenxuan Chen, Yang Chen, Yehao Li, Yiheng Zhang, Yimeng Wang, Yingwei Pan, Yi Peng, Zhaofan Qiu, Zijian Gong, Ziwei Feng","submitted_at":"2025-05-28T17:59:15Z","abstract_excerpt":"Recent advancements in image generative foundation models have prioritized quality improvements but often at the cost of increased computational complexity and inference latency. To address this critical trade-off, we introduce HiDream-I1, a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds. HiDream-I1 is constructed with a new sparse Diffusion Transformer (DiT) structure. Specifically, it starts with a dual-stream decoupled design of sparse DiT with dynamic Mixture-of-Experts (MoE) architecture, in whic"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"HiDream-I1 ... achieves state-of-the-art image generation quality within seconds.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The dual-stream decoupled sparse DiT with dynamic MoE architecture delivers the claimed quality and speed without hidden trade-offs in training cost or generalization that are not visible in the abstract.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"HiDream-I1 introduces a sparse DiT architecture with dual-stream processing and MoE for efficient state-of-the-art text-to-image generation, plus extensions to editing and an interactive agent.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"HiDream-I1 deploys a 17B-parameter sparse Diffusion Transformer that delivers state-of-the-art images in seconds.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"309e47e6ce805622a5ce7e8da766d1b120d9eceeed8cdc9c96d4d2e483a64fde"},"source":{"id":"2505.22705","kind":"arxiv","version":1},"verdict":{"id":"cfd22377-816a-4c29-8923-7a85dc9fb564","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T17:09:22.780761Z","strongest_claim":"HiDream-I1 ... achieves state-of-the-art image generation quality within seconds.","one_line_summary":"HiDream-I1 introduces a sparse DiT architecture with dual-stream processing and MoE for efficient state-of-the-art text-to-image generation, plus extensions to editing and an interactive agent.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The dual-stream decoupled sparse DiT with dynamic MoE architecture delivers the claimed quality and speed without hidden trade-offs in training cost or generalization that are not visible in the abstract.","pith_extraction_headline":"HiDream-I1 deploys a 17B-parameter sparse Diffusion Transformer that delivers state-of-the-art images in seconds."},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"45d4632ed6860caff02a1ed7d21c204cdeed86ada1f3f4e2aced70ad40f52e45"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2505.22705","created_at":"2026-05-17T23:38:47.163653+00:00"},{"alias_kind":"arxiv_version","alias_value":"2505.22705v1","created_at":"2026-05-17T23:38:47.163653+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2505.22705","created_at":"2026-05-17T23:38:47.163653+00:00"},{"alias_kind":"pith_short_12","alias_value":"IPPTRWIXLE5F","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"IPPTRWIXLE5FPQTJ","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"IPPTRWIX","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":23,"internal_anchor_count":23,"sample":[{"citing_arxiv_id":"2605.16810","citing_title":"Training-Free Occluded Text Rendering via Glyph Priors and Attention-Guided Semantic Blending","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.16951","citing_title":"Edit-GRPO: A Locality-Preserving Policy Optimization Framework for Image Editing","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2508.20751","citing_title":"Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2509.20360","citing_title":"EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2510.26583","citing_title":"Emu3.5: Native Multimodal Models are World Learners","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2512.07584","citing_title":"LongCat-Image Technical Report","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2602.06663","citing_title":"PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03249","citing_title":"BLK-Assist: A Methodological Framework for Artist-Led Co-Creation with Generative AI Models","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13062","citing_title":"Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11061","citing_title":"HiDream-O1-Image: A Natively Unified Image Generative Foundation Model with Pixel-level Unified Transformer","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12271","citing_title":"Beyond Text Prompts: Visual-to-Visual Generation as A Unified Paradigm","ref_index":61,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12112","citing_title":"When Policy Entropy Constraint Fails: Preserving Diversity in Flow-based RLHF via Perceptual Entropy","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12500","citing_title":"SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2604.25477","citing_title":"DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing","ref_index":58,"is_internal_anchor":true},{"citing_arxiv_id":"2604.12163","citing_title":"Nucleus-Image: Sparse MoE for Image Generation","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2604.12322","citing_title":"Self-Adversarial One Step Generation via Condition Shifting","ref_index":1,"is_internal_anchor":true},{"citing_arxiv_id":"2604.11487","citing_title":"NTIRE 2026 Challenge on Robust AI-Generated Image Detection in the Wild","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2604.06870","citing_title":"RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2508.02324","citing_title":"Qwen-Image Technical Report","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18168","citing_title":"Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation","ref_index":66,"is_internal_anchor":true},{"citing_arxiv_id":"2605.04128","citing_title":"Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.02641","citing_title":"Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE","ref_index":71,"is_internal_anchor":true},{"citing_arxiv_id":"2605.02641","citing_title":"Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE","ref_index":26,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/IPPTRWIXLE5FPQTJBUPLGYMZ5T","json":"https://pith.science/pith/IPPTRWIXLE5FPQTJBUPLGYMZ5T.json","graph_json":"https://pith.science/api/pith-number/IPPTRWIXLE5FPQTJBUPLGYMZ5T/graph.json","events_json":"https://pith.science/api/pith-number/IPPTRWIXLE5FPQTJBUPLGYMZ5T/events.json","paper":"https://pith.science/paper/IPPTRWIX"},"agent_actions":{"view_html":"https://pith.science/pith/IPPTRWIXLE5FPQTJBUPLGYMZ5T","download_json":"https://pith.science/pith/IPPTRWIXLE5FPQTJBUPLGYMZ5T.json","view_paper":"https://pith.science/paper/IPPTRWIX","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2505.22705&json=true","fetch_graph":"https://pith.science/api/pith-number/IPPTRWIXLE5FPQTJBUPLGYMZ5T/graph.json","fetch_events":"https://pith.science/api/pith-number/IPPTRWIXLE5FPQTJBUPLGYMZ5T/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/IPPTRWIXLE5FPQTJBUPLGYMZ5T/action/timestamp_anchor","attest_storage":"https://pith.science/pith/IPPTRWIXLE5FPQTJBUPLGYMZ5T/action/storage_attestation","attest_author":"https://pith.science/pith/IPPTRWIXLE5FPQTJBUPLGYMZ5T/action/author_attestation","sign_citation":"https://pith.science/pith/IPPTRWIXLE5FPQTJBUPLGYMZ5T/action/citation_signature","submit_replication":"https://pith.science/pith/IPPTRWIXLE5FPQTJBUPLGYMZ5T/action/replication_record"}},"created_at":"2026-05-17T23:38:47.163653+00:00","updated_at":"2026-05-17T23:38:47.163653+00:00"}