{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:FFRRDCREJJUZRFZ6YU4U2AJ4QL","short_pith_number":"pith:FFRRDCRE","schema_version":"1.0","canonical_sha256":"2963118a244a6998973ec5394d013c82fc48caa433ff7c86e5a5c88267c61114","source":{"kind":"arxiv","id":"2509.25161","version":1},"attestation_state":"computed","paper":{"title":"Rolling Forcing: Autoregressive Long Video Diffusion in Real Time","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Rolling Forcing generates multi-minute streaming videos in real time by jointly denoising frames with rising noise levels and anchoring attention to early frames.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Jiale Xu, Kunhao Liu, Shijian Lu, Wenbo Hu, Ying Shan","submitted_at":"2025-09-29T17:57:14Z","abstract_excerpt":"Streaming video generation, as one fundamental component in interactive world models and neural game engines, aims to generate high-quality, low-latency, and temporally coherent long video streams. However, most existing work suffers from severe error accumulation that often significantly degrades the generated stream videos over long horizons. We design Rolling Forcing, a novel video generation technique that enables streaming long videos with minimal error accumulation. Rolling Forcing comes with three novel designs. First, instead of iteratively sampling individual frames, which accelerates"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2509.25161","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.CV","submitted_at":"2025-09-29T17:57:14Z","cross_cats_sorted":[],"title_canon_sha256":"565c311a9827024869ebe4ced78d9f391852fb878c898d69dc5a436fc70301aa","abstract_canon_sha256":"e9032be0502e0ad2c4660053ff4a97c0fd3efa9a39acfe770bf1e3d6be8248e4"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:48.060517Z","signature_b64":"kRe8ob1khVEv/2VJpMbthb9+stJ/mhFxhEhKo8Bq94OzM2kzvJYtylk8+fAu5Ij7Qgo2d+pi4fpvgsQBMgbjBg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"2963118a244a6998973ec5394d013c82fc48caa433ff7c86e5a5c88267c61114","last_reissued_at":"2026-05-17T23:38:48.060007Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:48.060007Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Rolling Forcing: Autoregressive Long Video Diffusion in Real Time","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Rolling Forcing generates multi-minute streaming videos in real time by jointly denoising frames with rising noise levels and anchoring attention to early frames.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Jiale Xu, Kunhao Liu, Shijian Lu, Wenbo Hu, Ying Shan","submitted_at":"2025-09-29T17:57:14Z","abstract_excerpt":"Streaming video generation, as one fundamental component in interactive world models and neural game engines, aims to generate high-quality, low-latency, and temporally coherent long video streams. However, most existing work suffers from severe error accumulation that often significantly degrades the generated stream videos over long horizons. We design Rolling Forcing, a novel video generation technique that enables streaming long videos with minimal error accumulation. Rolling Forcing comes with three novel designs. First, instead of iteratively sampling individual frames, which accelerates"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Rolling Forcing enables real-time streaming generation of multi-minute videos on a single GPU, with substantially reduced error accumulation.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The joint denoising scheme with progressively increasing noise levels, combined with the attention sink and windowed distillation, actually suppresses error accumulation over long horizons without introducing new quality degradations or artifacts.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Rolling Forcing generates multi-minute videos in real time by jointly denoising frames at increasing noise levels, anchoring attention to early frames, and using windowed distillation to limit error accumulation.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Rolling Forcing generates multi-minute streaming videos in real time by jointly denoising frames with rising noise levels and anchoring attention to early frames.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"b1d9ec0ae4d5a76a18fb3edac5f259b74b9ed82d1f1ae6c949c60327161403eb"},"source":{"id":"2509.25161","kind":"arxiv","version":1},"verdict":{"id":"cbb61036-9a1f-4542-a62d-633c46c92f2d","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T11:11:36.428170Z","strongest_claim":"Rolling Forcing enables real-time streaming generation of multi-minute videos on a single GPU, with substantially reduced error accumulation.","one_line_summary":"Rolling Forcing generates multi-minute videos in real time by jointly denoising frames at increasing noise levels, anchoring attention to early frames, and using windowed distillation to limit error accumulation.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The joint denoising scheme with progressively increasing noise levels, combined with the attention sink and windowed distillation, actually suppresses error accumulation over long horizons without introducing new quality degradations or artifacts.","pith_extraction_headline":"Rolling Forcing generates multi-minute streaming videos in real time by jointly denoising frames with rising noise levels and anchoring attention to early frames."},"references":{"count":75,"sample":[{"doi":"","year":null,"title":"Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=","work_id":"da08a965-d658-4a8c-9a9b-5252efd12bd3","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Proceedings of the IEEE/CVF international conference on computer vision , pages=","work_id":"8d3af2d5-f2a3-4ce9-8358-51fef7d65c4a","ref_index":6,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"European Conference on Computer Vision , pages=","work_id":"446e85fd-a22a-4435-abc6-683f1ef5f164","ref_index":7,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Forty-first International Conference on Machine Learning , year=","work_id":"281442db-f65a-45d0-b97f-e65a1bc2460c","ref_index":12,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Advances in Neural Information Processing Systems , volume=","work_id":"7b57bb1a-4465-47e2-af2b-e0d912ad1da1","ref_index":17,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":75,"snapshot_sha256":"45f3d8b947f2245d1336705d84ce8e026ba8ffdded2a1cae64db131e43de34e6","internal_anchors":19},"formal_canon":{"evidence_count":2,"snapshot_sha256":"d93fe6c8aa68217aac8fe63b4de32d750cf8778eceb123f25f187a488e3d65d5"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2509.25161","created_at":"2026-05-17T23:38:48.060097+00:00"},{"alias_kind":"arxiv_version","alias_value":"2509.25161v1","created_at":"2026-05-17T23:38:48.060097+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2509.25161","created_at":"2026-05-17T23:38:48.060097+00:00"},{"alias_kind":"pith_short_12","alias_value":"FFRRDCREJJUZ","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"FFRRDCREJJUZRFZ6","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"FFRRDCRE","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":32,"internal_anchor_count":32,"sample":[{"citing_arxiv_id":"2602.02214","citing_title":"Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22718","citing_title":"WorldKV: Efficient World Memory with World Retrieval and Compression","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2602.02214","citing_title":"Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21028","citing_title":"DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20910","citing_title":"FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21072","citing_title":"Q-ARVD: Quantizing Autoregressive Video Diffusion Models","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15824","citing_title":"FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization","ref_index":38,"is_internal_anchor":true},{"citing_arxiv_id":"2605.16003","citing_title":"Echo-Forcing: A Scene Memory Framework for Interactive Long Video Generation","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2605.16147","citing_title":"Registers Matter for Pixel-Space Diffusion Transformers","ref_index":33,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18733","citing_title":"Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18739","citing_title":"LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation","ref_index":43,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17019","citing_title":"StreamingEffect: Real-Time Human-Centric Video Effect Generation","ref_index":37,"is_internal_anchor":true},{"citing_arxiv_id":"2512.04677","citing_title":"Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2512.04678","citing_title":"Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation","ref_index":45,"is_internal_anchor":true},{"citing_arxiv_id":"2602.06949","citing_title":"DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos","ref_index":63,"is_internal_anchor":true},{"citing_arxiv_id":"2602.07775","citing_title":"Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion","ref_index":63,"is_internal_anchor":true},{"citing_arxiv_id":"2510.02283","citing_title":"Self-Forcing++: Towards Minute-Scale High-Quality Video Generation","ref_index":38,"is_internal_anchor":true},{"citing_arxiv_id":"2512.14614","citing_title":"WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling","ref_index":43,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15199","citing_title":"EntityBench: Towards Entity-Consistent Long-Range Multi-Shot Video Generation","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14487","citing_title":"Head Forcing: Long Autoregressive Video Generation via Head Heterogeneity","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03118","citing_title":"Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12496","citing_title":"CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08729","citing_title":"Unison: Harmonizing Motion, Speech, and Sound for Human-Centric Audio-Video Generation","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2605.03849","citing_title":"Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06509","citing_title":"FreeSpec: Training-Free Long Video Generation via Singular-Spectrum Reconstruction","ref_index":31,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/FFRRDCREJJUZRFZ6YU4U2AJ4QL","json":"https://pith.science/pith/FFRRDCREJJUZRFZ6YU4U2AJ4QL.json","graph_json":"https://pith.science/api/pith-number/FFRRDCREJJUZRFZ6YU4U2AJ4QL/graph.json","events_json":"https://pith.science/api/pith-number/FFRRDCREJJUZRFZ6YU4U2AJ4QL/events.json","paper":"https://pith.science/paper/FFRRDCRE"},"agent_actions":{"view_html":"https://pith.science/pith/FFRRDCREJJUZRFZ6YU4U2AJ4QL","download_json":"https://pith.science/pith/FFRRDCREJJUZRFZ6YU4U2AJ4QL.json","view_paper":"https://pith.science/paper/FFRRDCRE","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2509.25161&json=true","fetch_graph":"https://pith.science/api/pith-number/FFRRDCREJJUZRFZ6YU4U2AJ4QL/graph.json","fetch_events":"https://pith.science/api/pith-number/FFRRDCREJJUZRFZ6YU4U2AJ4QL/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/FFRRDCREJJUZRFZ6YU4U2AJ4QL/action/timestamp_anchor","attest_storage":"https://pith.science/pith/FFRRDCREJJUZRFZ6YU4U2AJ4QL/action/storage_attestation","attest_author":"https://pith.science/pith/FFRRDCREJJUZRFZ6YU4U2AJ4QL/action/author_attestation","sign_citation":"https://pith.science/pith/FFRRDCREJJUZRFZ6YU4U2AJ4QL/action/citation_signature","submit_replication":"https://pith.science/pith/FFRRDCREJJUZRFZ6YU4U2AJ4QL/action/replication_record"}},"created_at":"2026-05-17T23:38:48.060097+00:00","updated_at":"2026-05-17T23:38:48.060097+00:00"}