{"work":{"id":"c4d59581-ef40-441f-937a-e325bff802be","openalex_id":null,"doi":null,"arxiv_id":"2510.08431","raw_key":null,"title":"Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency","authors":null,"authors_text":null,"year":2025,"venue":"cs.CV","abstract":"Although continuous-time consistency models (e.g., sCM, MeanFlow) are theoretically principled and empirically powerful for fast academic-scale diffusion, its applicability to large-scale text-to-image and video tasks remains unclear due to infrastructure challenges in Jacobian-vector product (JVP) computation and the limitations of evaluation benchmarks like FID. This work represents the first effort to scale up continuous-time consistency to general application-level image and video diffusion models, and to make JVP-based distillation effective at large scale. We first develop a parallelism-compatible FlashAttention-2 JVP kernel, enabling sCM training on models with over 10 billion parameters and high-dimensional video tasks. Our investigation reveals fundamental quality limitations of sCM in fine-detail generation, which we attribute to error accumulation and the \"mode-covering\" nature of its forward-divergence objective. To remedy this, we propose the score-regularized continuous-time consistency model (rCM), which incorporates score distillation as a long-skip regularizer. This integration complements sCM with the \"mode-seeking\" reverse divergence, effectively improving visual quality while maintaining high generation diversity. Validated on large-scale models (Cosmos-Predict2, Wan2.1) up to 14B parameters and 5-second videos, rCM generally matches the state-of-the-art distillation method DMD2 on quality metrics while mitigating mode collapse and offering notable advantages in diversity, all without GAN tuning or extensive hyperparameter searches. The distilled models generate high-fidelity samples in only $1\\sim4$ steps, accelerating diffusion sampling by $15\\times\\sim50\\times$. These results position rCM as a practical and theoretically grounded framework for advancing large-scale diffusion distillation. Code is available at https://github.com/NVlabs/rcm.","external_url":"https://arxiv.org/abs/2510.08431","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-25T06:35:24.888542+00:00","pith_arxiv_id":"2510.08431","created_at":"2026-05-10T09:03:26.185048+00:00","updated_at":"2026-05-25T06:35:24.888542+00:00","title_quality_ok":true,"display_title":"Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency","render_title":"Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency"},"hub":{"state":{"work_id":"c4d59581-ef40-441f-937a-e325bff802be","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":18,"external_cited_by_count":null,"distinct_field_count":3,"first_pith_cited_at":"2025-10-28T22:44:13+00:00","last_pith_cited_at":"2026-05-20T11:24:02+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-05-28T01:08:03.157720+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":5},{"context_role":"baseline","n":2},{"context_role":"dataset","n":1}],"polarity_counts":[{"context_polarity":"background","n":5},{"context_polarity":"baseline","n":2},{"context_polarity":"use_dataset","n":1}],"runs":{},"summary":{},"graph":{},"authors":[]}}