{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:KSL44R7PRTQF47XVXVL6LVJ3YK","short_pith_number":"pith:KSL44R7P","schema_version":"1.0","canonical_sha256":"5497ce47ef8ce05e7ef5bd57e5d53bc288aad95588ff80f60b0f1c0d805d8e4d","source":{"kind":"arxiv","id":"2503.09642","version":3},"attestation_state":"computed","paper":{"title":"Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"A commercial-level video generation model can be trained for $200,000.","cross_cats":["cs.AI"],"primary_cat":"cs.GR","authors_text":"Anbang Ye, Binluo Wang, Chaoyu Gong, Chenhui Shen, Gang Ren, Guojun Lei, Hang Xu, Hongxin Liu, Leijun Cheng, Limin Zhang, Minghao Li, Mingyan Jiang, Qianran Ma, Ruijie Zhang, Shijie Huang, Silan Hu, Tom Young, Wanying Liang, Wenjun Li, Xiang Lian, Xiangyu Peng, Xiaokang Wang, Xinying Guo, Xiwen Wu, Yang You, Yuanheng Zhao, Yuhui Wang, Yuqi Wang, Yuting Zhong, Yuxuan Lou, Zangwei Zheng, Zhuangyan Li, Ziang Wei","submitted_at":"2025-03-12T05:00:07Z","abstract_excerpt":"Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and sy"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":true},"canonical_record":{"source":{"id":"2503.09642","kind":"arxiv","version":3},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.GR","submitted_at":"2025-03-12T05:00:07Z","cross_cats_sorted":["cs.AI"],"title_canon_sha256":"7bf382ab4a32a6050bc9e324b5edf8938869a6638ded6c8e33a20c747c0c462c","abstract_canon_sha256":"f7c4092b7e10da5672e51ea7ad0ec03253d5920ee242e24bb989f422645577ae"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:47.923306Z","signature_b64":"WfSvbf+srz3thmFMYw/8I+pmtWCacmTVUNa1hbkofXzVBHwcRdFnymesARM+U20i6ZDniDXA0jmaSj+A7CMjCQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"5497ce47ef8ce05e7ef5bd57e5d53bc288aad95588ff80f60b0f1c0d805d8e4d","last_reissued_at":"2026-05-17T23:38:47.922770Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:47.922770Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"A commercial-level video generation model can be trained for $200,000.","cross_cats":["cs.AI"],"primary_cat":"cs.GR","authors_text":"Anbang Ye, Binluo Wang, Chaoyu Gong, Chenhui Shen, Gang Ren, Guojun Lei, Hang Xu, Hongxin Liu, Leijun Cheng, Limin Zhang, Minghao Li, Mingyan Jiang, Qianran Ma, Ruijie Zhang, Shijie Huang, Silan Hu, Tom Young, Wanying Liang, Wenjun Li, Xiang Lian, Xiangyu Peng, Xiaokang Wang, Xinying Guo, Xiwen Wu, Yang You, Yuanheng Zhao, Yuhui Wang, Yuqi Wang, Yuting Zhong, Yuxuan Lou, Zangwei Zheng, Zhuangyan Li, Ziang Wei","submitted_at":"2025-03-12T05:00:07Z","abstract_excerpt":"Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and sy"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the reported $200k training cost accurately captures all resources used and that the human evaluations and VBench scores provide an unbiased, apples-to-apples comparison to the referenced leading models without undisclosed differences in evaluation protocols or model capabilities.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Open-Sora 2.0 achieves commercial-level video generation quality at a training cost of only $200k through data curation, architecture, training strategy, and system optimizations, and is released fully open-source.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A commercial-level video generation model can be trained for $200,000.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a0dc37a11819e71eeb78347c3280bc7c016c884aea095047cd9458e729da1c79"},"source":{"id":"2503.09642","kind":"arxiv","version":3},"verdict":{"id":"988bd637-2aef-404d-9aa7-380c0f81a6fe","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T12:05:44.703641Z","strongest_claim":"With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha.","one_line_summary":"Open-Sora 2.0 achieves commercial-level video generation quality at a training cost of only $200k through data curation, architecture, training strategy, and system optimizations, and is released fully open-source.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the reported $200k training cost accurately captures all resources used and that the human evaluations and VBench scores provide an unbiased, apples-to-apples comparison to the referenced leading models without undisclosed differences in evaluation protocols or model capabilities.","pith_extraction_headline":"A commercial-level video generation model can be trained for $200,000."},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"b51a7e3d858c8a558691399d0586acbc3469a20f1d58e9135b0f653cab6ec353"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2503.09642","created_at":"2026-05-17T23:38:47.922849+00:00"},{"alias_kind":"arxiv_version","alias_value":"2503.09642v3","created_at":"2026-05-17T23:38:47.922849+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2503.09642","created_at":"2026-05-17T23:38:47.922849+00:00"},{"alias_kind":"pith_short_12","alias_value":"KSL44R7PRTQF","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"KSL44R7PRTQF47XV","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"KSL44R7P","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":28,"internal_anchor_count":28,"sample":[{"citing_arxiv_id":"2603.08403","citing_title":"SPIRAL: Self-Evolving Action-Conditioned Video Generation via Reflective Planning Agents","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18678","citing_title":"Lance: Unified Multimodal Modeling by Multi-Task Synergy","ref_index":92,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16503","citing_title":"Motif-Video 2B: Technical Report","ref_index":54,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18678","citing_title":"Lance: Unified Multimodal Modeling by Multi-Task Synergy","ref_index":91,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15256","citing_title":"ReactiveGWM: Steering NPC in Reactive Game World Models","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2512.02826","citing_title":"From Navigation to Refinement: Revealing the Two-Stage Nature of Flow-based Diffusion Models through Oracle Velocity","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2512.10248","citing_title":"RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2512.13281","citing_title":"VideoASMR-Bench: Can AI-Generated ASMR Videos Fool VLMs and Humans?","ref_index":40,"is_internal_anchor":true},{"citing_arxiv_id":"2512.22317","citing_title":"LangPrecip: Language-Aware Multimodal Precipitation Nowcasting","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2602.04939","citing_title":"SynthForensics: Benchmarking and Evaluating People-Centric Synthetic Video Deepfakes","ref_index":42,"is_internal_anchor":true},{"citing_arxiv_id":"2510.02283","citing_title":"Self-Forcing++: Towards Minute-Scale High-Quality Video Generation","ref_index":48,"is_internal_anchor":true},{"citing_arxiv_id":"2505.11709","citing_title":"EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.01725","citing_title":"Motion-Aware Caching for Efficient Autoregressive Video Generation","ref_index":46,"is_internal_anchor":true},{"citing_arxiv_id":"2504.13074","citing_title":"SkyReels-V2: Infinite-length Film Generative Model","ref_index":75,"is_internal_anchor":true},{"citing_arxiv_id":"2503.21755","citing_title":"VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness","ref_index":66,"is_internal_anchor":true},{"citing_arxiv_id":"2604.04142","citing_title":"OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models","ref_index":28,"is_internal_anchor":true},{"citing_arxiv_id":"2604.17565","citing_title":"UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models","ref_index":56,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10218","citing_title":"Relative Score Policy Optimization for Diffusion Language Models","ref_index":104,"is_internal_anchor":true},{"citing_arxiv_id":"2604.24416","citing_title":"Scaling Properties of Continuous Diffusion Spoken Language Models","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2605.01725","citing_title":"Motion-Aware Caching for Efficient Autoregressive Video Generation","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16503","citing_title":"Motif-Video 2B: Technical Report","ref_index":54,"is_internal_anchor":true},{"citing_arxiv_id":"2604.10103","citing_title":"Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation","ref_index":33,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07079","citing_title":"Learning Visual Feature-Based World Models via Residual Latent Action","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2604.06339","citing_title":"Evolution of Video Generative Foundations","ref_index":176,"is_internal_anchor":true},{"citing_arxiv_id":"2604.04335","citing_title":"GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads","ref_index":45,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/KSL44R7PRTQF47XVXVL6LVJ3YK","json":"https://pith.science/pith/KSL44R7PRTQF47XVXVL6LVJ3YK.json","graph_json":"https://pith.science/api/pith-number/KSL44R7PRTQF47XVXVL6LVJ3YK/graph.json","events_json":"https://pith.science/api/pith-number/KSL44R7PRTQF47XVXVL6LVJ3YK/events.json","paper":"https://pith.science/paper/KSL44R7P"},"agent_actions":{"view_html":"https://pith.science/pith/KSL44R7PRTQF47XVXVL6LVJ3YK","download_json":"https://pith.science/pith/KSL44R7PRTQF47XVXVL6LVJ3YK.json","view_paper":"https://pith.science/paper/KSL44R7P","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2503.09642&json=true","fetch_graph":"https://pith.science/api/pith-number/KSL44R7PRTQF47XVXVL6LVJ3YK/graph.json","fetch_events":"https://pith.science/api/pith-number/KSL44R7PRTQF47XVXVL6LVJ3YK/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/KSL44R7PRTQF47XVXVL6LVJ3YK/action/timestamp_anchor","attest_storage":"https://pith.science/pith/KSL44R7PRTQF47XVXVL6LVJ3YK/action/storage_attestation","attest_author":"https://pith.science/pith/KSL44R7PRTQF47XVXVL6LVJ3YK/action/author_attestation","sign_citation":"https://pith.science/pith/KSL44R7PRTQF47XVXVL6LVJ3YK/action/citation_signature","submit_replication":"https://pith.science/pith/KSL44R7PRTQF47XVXVL6LVJ3YK/action/replication_record"}},"created_at":"2026-05-17T23:38:47.922849+00:00","updated_at":"2026-05-17T23:38:47.922849+00:00"}