{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:PT4O7RD2MTGABAMLO3TTC6F6X6","short_pith_number":"pith:PT4O7RD2","schema_version":"1.0","canonical_sha256":"7cf8efc47a64cc00818b76e73178bebf9b11782306f29328fb05b3994361d6a1","source":{"kind":"arxiv","id":"2512.13507","version":3},"attestation_state":"computed","paper":{"title":"Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Seedance 1.5 pro is a joint audio-visual generation model achieving high synchronization via dual-branch diffusion transformer and post-training optimizations.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Ai Li, Ashley Kim, Bangbang Yang, Benhui Zou, Bibo He, Bowen Yu, Boyang Hao, Ceyuan Yang, Chao Liang, Chengjian Feng, Chujie Yuan, Chuntao Zhang, Di Wu, Dong Guo, Donglei Ji, Fangfang Liu, Fan Sun, Feilong Zuo, Fei Xiao, Feiya Li, Feng Cheng, Feng Ling, Feng Wang, Fengxuan Zhao, Furui Wang, Gaohong Liu, Gen Li, Guang Shi, Guohong Wu, Guoqiang Wei, Han Feng, Hanjie Wu, Han Liang, Han Mu, Hao Zheng, Heng Lin, Heng Zhang, Heyi Chen, Huating Zhao, Huixia Li, Jiahui Zhu, Jianan Kong, Jianbin Zheng, Jian Cong, Jian Wu, Jian Yu, Jianzhong Liang, Jiaqi Yang, Jiashi Li, Jiawei Liu, Jie Liu, Jie Wu, Jiexin Zhou, Jihao Liu, Jing Cui, Jing Fang, Jingjie Zhang, Jingzhe Ning, Jinlan Xue, Jinran Wang, Junkai Wang, Junliang Fan, Junlin Lyu, Kai Shen, Kengyu Lin, Ke Wang, Kexin Wang, Kuan Zhu, Kuo Zhang, Lecheng Lyu, Lei Shi, Liang Li, Liang Xiang, Liang Zhang, Lianke Qin, Linxiao Yuan, Li Sun, Liying Zhang, Manlin Zhang, Ming Li, Mingyuan Gao, Pan Xie, Qian He, Qian Lyu, Qide Dong, Qingkai Hao, Qingyi Wang, Qinpeng Cui, Qiushan Guo, Renfei Sun, Ruiqi Xia, Rui Wang, Runkai Yang, Ruolan Wu, Ruoqing Hu, Sen Wang, Shanchuan Lin, Shanshan Lao, Shanshan Li, Shenhan Zhu, Shen Yan, Shouda Liu, Shuai Wang, Shuang Xu, Shuangyi Xie, Shu Liu, Sichao Liu, Sichun Zeng, Siqi Jiang, Siyan Chen, Songting Yao, Songwei Liu, Tao Li, Tao Yang, Team Seedance, Tianheng Cheng, Tingru Wang, Ting Zhang, Tuyen Hoang, Wang Liao, Wanru Wei, Weichen Wang, Weida Zhang, Weihong Zeng, Wei Jiang, Weilin Huang, Wenjia Zhu, Wenjing Tang, Xian Li, Xiaohe Zhang, Xiaojie Li, Xiaonan Nie, Xiaoyang Li, Xiaozheng Zheng, Xi Hu, Xi Lin, Xin Chen, Xinglong Wu, Xingxing Li, Xin Liu, Xinqi Cheng, Xin Wang, Xinyan Zhang, Xitong Pan, Xuefeng Xiao, Xuejiao Zeng, Xue Liu, Xueqiong Qu, Xuyan Chi, Yalin Liao, Yameng Li, Yanfei Chen, Yanghua Peng, Yang Yang, Yangyang Zheng, Yang Zhao, Yanhui Wang, Yan Song, Yan Sun, Yan Zeng, Yan Zhang, Yaxue Tang, Yibo Liu, Yichong Leng, Yifan Yao, Yifu Li, Yihang Yang, Yijie Zheng, Ying Chen, Ying Liang, Yinglong Song, Yiying Li, Yonghui Wu, Yuan Zhang, Yue Wang, Yu Gao, Yunpu Jiang, Yuping Wang, Yuxi Ren, Yuxuan Wang, Zetao Fang, Zeyu Sun, Zhaoyang Huang, Zhichao Lai, Zhijie Lin, Zhiqiang Liang, Zhixian Yang, Zhongyi Huang, Zhuo Chen, Zhuo Jiang, Zikun Liu, Zilyu Ye, Zirui Tao, Zixiang Zhang, Ziyan Yang, Ziyu Wang, Zuxi Liu","submitted_at":"2025-12-15T16:36:52Z","abstract_excerpt":"Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality. To ensure practical utility, we implement meticulous post-training optimizations, including Supervised Fine-Tuning (SFT) on high-quality dat"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2512.13507","kind":"arxiv","version":3},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CV","submitted_at":"2025-12-15T16:36:52Z","cross_cats_sorted":[],"title_canon_sha256":"d71dc26ed6ed8d1f3da12985a479e85d4ad926897d65d9d0702834503462bc37","abstract_canon_sha256":"1ba1c0b66114e0bae38a6b1544a15faf31d63a716804cb3c6ad9e335a5624171"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:49.479496Z","signature_b64":"p6M3E3I3/h1MQ2i2CRjJT5PjdQzMH6oB2uKE8jN2WVIgLB42ifrX1yUkRbCTk9N/OUTL4j1a9p2Lnt2a+02vDw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"7cf8efc47a64cc00818b76e73178bebf9b11782306f29328fb05b3994361d6a1","last_reissued_at":"2026-05-17T23:38:49.478903Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:49.478903Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Seedance 1.5 pro is a joint audio-visual generation model achieving high synchronization via dual-branch diffusion transformer and post-training optimizations.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Ai Li, Ashley Kim, Bangbang Yang, Benhui Zou, Bibo He, Bowen Yu, Boyang Hao, Ceyuan Yang, Chao Liang, Chengjian Feng, Chujie Yuan, Chuntao Zhang, Di Wu, Dong Guo, Donglei Ji, Fangfang Liu, Fan Sun, Feilong Zuo, Fei Xiao, Feiya Li, Feng Cheng, Feng Ling, Feng Wang, Fengxuan Zhao, Furui Wang, Gaohong Liu, Gen Li, Guang Shi, Guohong Wu, Guoqiang Wei, Han Feng, Hanjie Wu, Han Liang, Han Mu, Hao Zheng, Heng Lin, Heng Zhang, Heyi Chen, Huating Zhao, Huixia Li, Jiahui Zhu, Jianan Kong, Jianbin Zheng, Jian Cong, Jian Wu, Jian Yu, Jianzhong Liang, Jiaqi Yang, Jiashi Li, Jiawei Liu, Jie Liu, Jie Wu, Jiexin Zhou, Jihao Liu, Jing Cui, Jing Fang, Jingjie Zhang, Jingzhe Ning, Jinlan Xue, Jinran Wang, Junkai Wang, Junliang Fan, Junlin Lyu, Kai Shen, Kengyu Lin, Ke Wang, Kexin Wang, Kuan Zhu, Kuo Zhang, Lecheng Lyu, Lei Shi, Liang Li, Liang Xiang, Liang Zhang, Lianke Qin, Linxiao Yuan, Li Sun, Liying Zhang, Manlin Zhang, Ming Li, Mingyuan Gao, Pan Xie, Qian He, Qian Lyu, Qide Dong, Qingkai Hao, Qingyi Wang, Qinpeng Cui, Qiushan Guo, Renfei Sun, Ruiqi Xia, Rui Wang, Runkai Yang, Ruolan Wu, Ruoqing Hu, Sen Wang, Shanchuan Lin, Shanshan Lao, Shanshan Li, Shenhan Zhu, Shen Yan, Shouda Liu, Shuai Wang, Shuang Xu, Shuangyi Xie, Shu Liu, Sichao Liu, Sichun Zeng, Siqi Jiang, Siyan Chen, Songting Yao, Songwei Liu, Tao Li, Tao Yang, Team Seedance, Tianheng Cheng, Tingru Wang, Ting Zhang, Tuyen Hoang, Wang Liao, Wanru Wei, Weichen Wang, Weida Zhang, Weihong Zeng, Wei Jiang, Weilin Huang, Wenjia Zhu, Wenjing Tang, Xian Li, Xiaohe Zhang, Xiaojie Li, Xiaonan Nie, Xiaoyang Li, Xiaozheng Zheng, Xi Hu, Xi Lin, Xin Chen, Xinglong Wu, Xingxing Li, Xin Liu, Xinqi Cheng, Xin Wang, Xinyan Zhang, Xitong Pan, Xuefeng Xiao, Xuejiao Zeng, Xue Liu, Xueqiong Qu, Xuyan Chi, Yalin Liao, Yameng Li, Yanfei Chen, Yanghua Peng, Yang Yang, Yangyang Zheng, Yang Zhao, Yanhui Wang, Yan Song, Yan Sun, Yan Zeng, Yan Zhang, Yaxue Tang, Yibo Liu, Yichong Leng, Yifan Yao, Yifu Li, Yihang Yang, Yijie Zheng, Ying Chen, Ying Liang, Yinglong Song, Yiying Li, Yonghui Wu, Yuan Zhang, Yue Wang, Yu Gao, Yunpu Jiang, Yuping Wang, Yuxi Ren, Yuxuan Wang, Zetao Fang, Zeyu Sun, Zhaoyang Huang, Zhichao Lai, Zhijie Lin, Zhiqiang Liang, Zhixian Yang, Zhongyi Huang, Zhuo Chen, Zhuo Jiang, Zikun Liu, Zilyu Ye, Zirui Tao, Zixiang Zhang, Ziyan Yang, Ziyu Wang, Zuxi Liu","submitted_at":"2025-12-15T16:36:52Z","abstract_excerpt":"Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality. To ensure practical utility, we implement meticulous post-training optimizations, including Supervised Fine-Tuning (SFT) on high-quality dat"},"claims":{"count":3,"items":[{"kind":"strongest_claim","text":"Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The cross-modal joint module and multi-stage data pipeline combined with SFT and RLHF produce the stated synchronization and quality levels, an assumption stated without supporting metrics or comparisons in the provided abstract.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Seedance 1.5 pro is a joint audio-visual generation model achieving high synchronization via dual-branch diffusion transformer and post-training optimizations.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"}],"snapshot_sha256":"de06d01e1f6cbe9c68f7315282652b52cd145134ef5ae5af483a4b7a4f5f5c05"},"source":{"id":"2512.13507","kind":"arxiv","version":3},"verdict":{"id":"cfa33167-a27d-4ae3-814f-56d9bd8ae06c","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T01:28:08.039043Z","strongest_claim":"Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality.","one_line_summary":"Seedance 1.5 pro is a joint audio-visual generation model achieving high synchronization via dual-branch diffusion transformer and post-training optimizations.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The cross-modal joint module and multi-stage data pipeline combined with SFT and RLHF produce the stated synchronization and quality levels, an assumption stated without supporting metrics or comparisons in the provided abstract.","pith_extraction_headline":""},"references":{"count":16,"sample":[{"doi":"","year":2024,"title":"Scaling rectified flow transformers for high-resolution image synthesis","work_id":"3d3628d5-0f93-4e87-a1fd-d22fed308442","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Seedream 3.0 Technical Report","work_id":"013e56d0-7f47-4d0e-bbca-e9540fc0e0cc","ref_index":2,"cited_arxiv_id":"2504.11346","is_internal_anchor":true},{"doi":"","year":2025,"title":"Seedance 1.0: Exploring the Boundaries of Video Generation Models","work_id":"b2e36b5d-99e4-45b4-9358-64f6d3501983","ref_index":3,"cited_arxiv_id":"2506.09113","is_internal_anchor":true},{"doi":"","year":2025,"title":"Mean Flows for One-step Generative Modeling","work_id":"07a52ad5-0f82-4095-9a66-559b09fea1ae","ref_index":4,"cited_arxiv_id":"2505.13447","is_internal_anchor":true},{"doi":"","year":2025,"title":"Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model","work_id":"e285b9d3-0bf4-4f98-ba3a-e545425ab960","ref_index":5,"cited_arxiv_id":"2503.07703","is_internal_anchor":true}],"resolved_work":16,"snapshot_sha256":"c3f3057f6d503a19c5afc815638bf3941ba38c7e8b0ca4e924963c0c8d7ddac9","internal_anchors":11},"formal_canon":{"evidence_count":2,"snapshot_sha256":"3609a2d4dc54473b3a70bc95e96584e3c6cbb9667df9093bc2ca2019333e39cf"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2512.13507","created_at":"2026-05-17T23:38:49.479010+00:00"},{"alias_kind":"arxiv_version","alias_value":"2512.13507v3","created_at":"2026-05-17T23:38:49.479010+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2512.13507","created_at":"2026-05-17T23:38:49.479010+00:00"},{"alias_kind":"pith_short_12","alias_value":"PT4O7RD2MTGA","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"PT4O7RD2MTGABAML","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"PT4O7RD2","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":25,"internal_anchor_count":25,"sample":[{"citing_arxiv_id":"2605.22570","citing_title":"VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis","ref_index":63,"is_internal_anchor":true},{"citing_arxiv_id":"2604.27505","citing_title":"Leveraging Verifier-Based Reinforcement Learning in Image Editing","ref_index":48,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16503","citing_title":"Motif-Video 2B: Technical Report","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2601.23286","citing_title":"VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2601.20540","citing_title":"Advancing Open-source World Models","ref_index":76,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12179","citing_title":"SyncDPO: Enhancing Temporal Synchronization in Video-Audio Joint Generation via Preference Learning","ref_index":37,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12480","citing_title":"OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2605.03652","citing_title":"AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2604.27505","citing_title":"Leveraging Verifier-Based Reinforcement Learning in Image Editing","ref_index":48,"is_internal_anchor":true},{"citing_arxiv_id":"2605.03652","citing_title":"AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2604.24953","citing_title":"ViPO: Visual Preference Optimization at Scale","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2604.19193","citing_title":"How Far Are Video Models from True Multimodal Reasoning?","ref_index":57,"is_internal_anchor":true},{"citing_arxiv_id":"2604.11804","citing_title":"OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2604.11521","citing_title":"Continuous Adversarial Flow Models","ref_index":66,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16503","citing_title":"Motif-Video 2B: Technical Report","ref_index":34,"is_internal_anchor":true},{"citing_arxiv_id":"2604.10980","citing_title":"Tracking High-order Evolutions via Cascading Low-rank Fitting","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2604.08540","citing_title":"AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2604.08363","citing_title":"CapTalk: Unified Voice Design for Single-Utterance and Dialogue Speech Generation","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2604.07958","citing_title":"ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2605.07503","citing_title":"Diffusion-APO: Trajectory-Aware Direct Preference Alignment for Video Diffusion Transformers","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2604.14148","citing_title":"Seedance 2.0: Advancing Video Generation for World Complexity","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2604.15911","citing_title":"Efficient Video Diffusion Models: Advancements and Challenges","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18326","citing_title":"OmniHuman: A Large-scale Dataset and Benchmark for Human-Centric Video Generation","ref_index":38,"is_internal_anchor":true},{"citing_arxiv_id":"2605.03652","citing_title":"AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2605.01761","citing_title":"TrajShield: Trajectory-Level Safety Mediation for Defending Text-to-Video Models Against Jailbreak Attacks","ref_index":5,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/PT4O7RD2MTGABAMLO3TTC6F6X6","json":"https://pith.science/pith/PT4O7RD2MTGABAMLO3TTC6F6X6.json","graph_json":"https://pith.science/api/pith-number/PT4O7RD2MTGABAMLO3TTC6F6X6/graph.json","events_json":"https://pith.science/api/pith-number/PT4O7RD2MTGABAMLO3TTC6F6X6/events.json","paper":"https://pith.science/paper/PT4O7RD2"},"agent_actions":{"view_html":"https://pith.science/pith/PT4O7RD2MTGABAMLO3TTC6F6X6","download_json":"https://pith.science/pith/PT4O7RD2MTGABAMLO3TTC6F6X6.json","view_paper":"https://pith.science/paper/PT4O7RD2","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2512.13507&json=true","fetch_graph":"https://pith.science/api/pith-number/PT4O7RD2MTGABAMLO3TTC6F6X6/graph.json","fetch_events":"https://pith.science/api/pith-number/PT4O7RD2MTGABAMLO3TTC6F6X6/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/PT4O7RD2MTGABAMLO3TTC6F6X6/action/timestamp_anchor","attest_storage":"https://pith.science/pith/PT4O7RD2MTGABAMLO3TTC6F6X6/action/storage_attestation","attest_author":"https://pith.science/pith/PT4O7RD2MTGABAMLO3TTC6F6X6/action/author_attestation","sign_citation":"https://pith.science/pith/PT4O7RD2MTGABAMLO3TTC6F6X6/action/citation_signature","submit_replication":"https://pith.science/pith/PT4O7RD2MTGABAMLO3TTC6F6X6/action/replication_record"}},"created_at":"2026-05-17T23:38:49.479010+00:00","updated_at":"2026-05-17T23:38:49.479010+00:00"}