{"work":{"id":"b2e36b5d-99e4-45b4-9358-64f6d3501983","openalex_id":null,"doi":null,"arxiv_id":"2506.09113","raw_key":null,"title":"Seedance 1.0: Exploring the Boundaries of Video Generation Models","authors":null,"authors_text":"Yu Gao, Haoyuan Guo, Tuyen Hoang, Weilin Huang, Lu Jiang, Fangyuan Kong","year":2025,"venue":"cs.CV","abstract":"Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning, enabling comprehensive learning across diverse scenarios; (ii) an efficient architecture design with proposed training paradigm, which allows for natively supporting multi-shot generation and jointly learning of both text-to-video and image-to-video tasks. (iii) carefully-optimized post-training approaches leveraging fine-grained supervised fine-tuning, and video-specific RLHF with multi-dimensional reward mechanisms for comprehensive performance improvements; (iv) excellent model acceleration achieving ~10x inference speedup through multi-stage distillation strategies and system-level optimizations. Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds (NVIDIA-L20). Compared to state-of-the-art video generation models, Seedance 1.0 stands out with high-quality and fast video generation having superior spatiotemporal fluidity with structural stability, precise instruction adherence in complex multi-subject contexts, native multi-shot narrative coherence with consistent subject representation.","external_url":"https://arxiv.org/abs/2506.09113","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-06-29T13:03:26.190486+00:00","pith_arxiv_id":"2506.09113","created_at":"2026-05-08T18:28:57.565030+00:00","updated_at":"2026-06-29T13:03:26.190486+00:00","title_quality_ok":true,"display_title":"Seedance 1.0: Exploring the Boundaries of Video Generation Models","render_title":"Seedance 1.0: Exploring the Boundaries of Video Generation Models"},"hub":{"state":{"work_id":"b2e36b5d-99e4-45b4-9358-64f6d3501983","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":56,"external_cited_by_count":null,"distinct_field_count":8,"first_pith_cited_at":"2025-10-09T16:45:30+00:00","last_pith_cited_at":"2026-06-24T06:45:56+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T13:08:47.694285+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":20},{"context_role":"baseline","n":4},{"context_role":"method","n":1}],"polarity_counts":[{"context_polarity":"background","n":20},{"context_polarity":"baseline","n":4},{"context_polarity":"use_method","n":1}],"runs":{"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T18:20:08.378150+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"Wan: Open and Advanced Large-Scale Video Generative Models","work_id":"ad3ebc3b-4224-46c9-b61d-bcf135da0a7c","shared_citers":27},{"title":"HunyuanVideo: A Systematic Framework For Large Video Generative Models","work_id":"881efa7e-7e73-4c66-9cc3-2803e551061c","shared_citers":21},{"title":"CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer","work_id":"f38fc088-12aa-4bf4-9ecd-08d3e797ccb7","shared_citers":14},{"title":"Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets","work_id":"4f68eada-27e3-437a-a2fe-6e4ca524d0d3","shared_citers":12},{"title":"Seedance 1.5 pro: A native audio-visual joint generation foundation model","work_id":"02efb197-4e2b-4141-8587-21b92ad92f08","shared_citers":9},{"title":"Movie Gen: A Cast of Media Foundation Models","work_id":"a6a118b0-002f-4b19-881f-7f1183e0d7d8","shared_citers":8},{"title":"Flow Matching for Generative Modeling","work_id":"6edb71c4-5d64-40af-a394-9757ea051a36","shared_citers":7},{"title":"Hunyuanvideo 1.5 technical report","work_id":"ed898a38-b053-407c-bbce-41561510c1de","shared_citers":7},{"title":"Qwen3-VL Technical Report","work_id":"1fe243aa-e3c0-4da6-b391-4cbcfc88d5c0","shared_citers":7},{"title":"AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning","work_id":"1f9d1d3b-a6d6-45a9-9f13-51393c03be8a","shared_citers":6},{"title":"arXiv preprint arXiv:2504.13074 (2025) 4 16 Xingtong Ge et al","work_id":"2ce11350-273e-4f0d-ae78-292aa3151060","shared_citers":6},{"title":"LTX-Video: Realtime Video Latent Diffusion","work_id":"cee5c521-3ce9-466e-a035-1e42f89254f4","shared_citers":6},{"title":"Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion","work_id":"53e58ef9-7932-4b83-b757-34ac14db3e0f","shared_citers":6},{"title":"Towards Accurate Generative Models of Video: A New Metric & Challenges","work_id":"72f42543-17d5-49aa-ba5a-25d67ffbb88a","shared_citers":6},{"title":"Vbench-2.0: Advancing video generation benchmark suite for intrinsic faithfulness","work_id":"14060202-ac5f-48e9-b91a-24d150775431","shared_citers":6},{"title":"arXiv preprint arXiv:2503.09642 (2025)","work_id":"c22f9060-268c-4cc9-8018-dee486a23da1","shared_citers":5},{"title":"arXiv preprint arXiv:2509.22622 (2025) 2, 3, 4, 10, 11, 12, 13, 21, 22","work_id":"29ec6550-6ac1-4cc6-8aae-b4206f1d5b38","shared_citers":5},{"title":"Classifier-Free Diffusion Guidance","work_id":"acf2c588-c088-4a6c-938e-150ad7c666d7","shared_citers":5},{"title":"Cosmos World Foundation Model Platform for Physical AI","work_id":"a2dba24c-318d-476a-8b21-4289c265810c","shared_citers":5},{"title":"Emerging Properties in Unified Multimodal Pretraining","work_id":"e0cfd82c-f5d4-44fd-b531-ec73ab0a805b","shared_citers":5},{"title":"Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow","work_id":"a1989e1b-d66d-4533-be3a-fb9c5fd62290","shared_citers":5},{"title":"Make-A-Video: Text-to-Video Generation without Text-Video Data","work_id":"52a801fc-a707-45a1-a8cd-0d6702f124ab","shared_citers":5},{"title":"Open-Sora: Democratizing Efficient Video Production for All","work_id":"8b29ba7b-3d84-4281-85b7-9eaf905afd7f","shared_citers":5},{"title":"Step-video-t2v tech- nical report: The practice, challenges, and future of video foundation model","work_id":"f9be7275-0997-4f96-bbde-dcd6ea138476","shared_citers":5}],"time_series":[{"n":1,"year":2025},{"n":32,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T18:19:37.667644+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T18:19:46.855768+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Seedance 1.0: Exploring the Boundaries of Video Generation Models","claims":[{"claim_text":"Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning, enabling comprehensive learning across diverse scenarios; (ii) an efficient architecture d","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Seedance 1.0: Exploring the Boundaries of Video Generation Models because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T18:20:04.348464+00:00"}},"summary":{"title":"Seedance 1.0: Exploring the Boundaries of Video Generation Models","claims":[{"claim_text":"Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning, enabling comprehensive learning across diverse scenarios; (ii) an efficient architecture d","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Seedance 1.0: Exploring the Boundaries of Video Generation Models because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"Wan: Open and Advanced Large-Scale Video Generative Models","work_id":"ad3ebc3b-4224-46c9-b61d-bcf135da0a7c","shared_citers":27},{"title":"HunyuanVideo: A Systematic Framework For Large Video Generative Models","work_id":"881efa7e-7e73-4c66-9cc3-2803e551061c","shared_citers":21},{"title":"CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer","work_id":"f38fc088-12aa-4bf4-9ecd-08d3e797ccb7","shared_citers":14},{"title":"Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets","work_id":"4f68eada-27e3-437a-a2fe-6e4ca524d0d3","shared_citers":12},{"title":"Seedance 1.5 pro: A native audio-visual joint generation foundation model","work_id":"02efb197-4e2b-4141-8587-21b92ad92f08","shared_citers":9},{"title":"Movie Gen: A Cast of Media Foundation Models","work_id":"a6a118b0-002f-4b19-881f-7f1183e0d7d8","shared_citers":8},{"title":"Flow Matching for Generative Modeling","work_id":"6edb71c4-5d64-40af-a394-9757ea051a36","shared_citers":7},{"title":"Hunyuanvideo 1.5 technical report","work_id":"ed898a38-b053-407c-bbce-41561510c1de","shared_citers":7},{"title":"Qwen3-VL Technical Report","work_id":"1fe243aa-e3c0-4da6-b391-4cbcfc88d5c0","shared_citers":7},{"title":"AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning","work_id":"1f9d1d3b-a6d6-45a9-9f13-51393c03be8a","shared_citers":6},{"title":"arXiv preprint arXiv:2504.13074 (2025) 4 16 Xingtong Ge et al","work_id":"2ce11350-273e-4f0d-ae78-292aa3151060","shared_citers":6},{"title":"LTX-Video: Realtime Video Latent Diffusion","work_id":"cee5c521-3ce9-466e-a035-1e42f89254f4","shared_citers":6},{"title":"Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion","work_id":"53e58ef9-7932-4b83-b757-34ac14db3e0f","shared_citers":6},{"title":"Towards Accurate Generative Models of Video: A New Metric & Challenges","work_id":"72f42543-17d5-49aa-ba5a-25d67ffbb88a","shared_citers":6},{"title":"Vbench-2.0: Advancing video generation benchmark suite for intrinsic faithfulness","work_id":"14060202-ac5f-48e9-b91a-24d150775431","shared_citers":6},{"title":"arXiv preprint arXiv:2503.09642 (2025)","work_id":"c22f9060-268c-4cc9-8018-dee486a23da1","shared_citers":5},{"title":"arXiv preprint arXiv:2509.22622 (2025) 2, 3, 4, 10, 11, 12, 13, 21, 22","work_id":"29ec6550-6ac1-4cc6-8aae-b4206f1d5b38","shared_citers":5},{"title":"Classifier-Free Diffusion Guidance","work_id":"acf2c588-c088-4a6c-938e-150ad7c666d7","shared_citers":5},{"title":"Cosmos World Foundation Model Platform for Physical AI","work_id":"a2dba24c-318d-476a-8b21-4289c265810c","shared_citers":5},{"title":"Emerging Properties in Unified Multimodal Pretraining","work_id":"e0cfd82c-f5d4-44fd-b531-ec73ab0a805b","shared_citers":5},{"title":"Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow","work_id":"a1989e1b-d66d-4533-be3a-fb9c5fd62290","shared_citers":5},{"title":"Make-A-Video: Text-to-Video Generation without Text-Video Data","work_id":"52a801fc-a707-45a1-a8cd-0d6702f124ab","shared_citers":5},{"title":"Open-Sora: Democratizing Efficient Video Production for All","work_id":"8b29ba7b-3d84-4281-85b7-9eaf905afd7f","shared_citers":5},{"title":"Step-video-t2v tech- nical report: The practice, challenges, and future of video foundation model","work_id":"f9be7275-0997-4f96-bbde-dcd6ea138476","shared_citers":5}],"time_series":[{"n":1,"year":2025},{"n":32,"year":2026}],"dependency_candidates":[]},"authors":[]}}